Author: mattmann Date: Fri Nov 11 00:09:11 2016 New Revision: 1769231 URL: http://svn.apache.org/viewvc?rev=1769231&view=rev Log: Generate index for Apache Tika 1.14.
Added: tika/site/src/site/apt/1.14/index.apt Added: tika/site/src/site/apt/1.14/index.apt URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/1.14/index.apt?rev=1769231&view=auto ============================================================================== --- tika/site/src/site/apt/1.14/index.apt (added) +++ tika/site/src/site/apt/1.14/index.apt Fri Nov 11 00:09:11 2016 @@ -0,0 +1,156 @@ + ---------------- + Apache Tika 1.14 + ---------------- + +~~ Licensed to the Apache Software Foundation (ASF) under one or more +~~ contributor license agreements. See the NOTICE file distributed with +~~ this work for additional information regarding copyright ownership. +~~ The ASF licenses this file to You under the Apache License, Version 2.0 +~~ (the "License"); you may not use this file except in compliance with +~~ the License. You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. + +Apache Tika 1.14 + + The most notable changes in Tika 1.14 over the previous release are: + + * Extract all headers from MSG/RFC822 ({{{http://issues.apache.org/jira/browse/TIKA-2122}TIKA-2122}}). + + * 9.1 ({{{http://issues.apache.org/jira/browse/TIKA-2113}TIKA-2113}}). + + * Extract PDF DocInfo metadata into separate keys to preventoverwriting by XMP metadata ({{{http://issues.apache.org/jira/browse/TIKA-2057}TIKA-2057}}). + + * Re-enable fileUrl for tika-server ({{{http://issues.apache.org/jira/browse/TIKA-2081}TIKA-2081}}). If you choose,to use this feature, beware of the security vulnerabilities!See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271 + + * Add Tesseract's hOCR output format as an option, via Eric Pugh({{{http://issues.apache.org/jira/browse/TIKA-2093}TIKA-2093}}) + + * Extract macros from MSOffice files ({{{http://issues.apache.org/jira/browse/TIKA-2069}TIKA-2069}}). + + * Maintain passed-in mime in TXTParser ({{{http://issues.apache.org/jira/browse/TIKA-2047}TIKA-2047}}). + + * Upgrade to POI.3-15 ({{{http://issues.apache.org/jira/browse/TIKA-2013}TIKA-2013}}). + + * 0.3 ({{{http://issues.apache.org/jira/browse/TIKA-2051}TIKA-2051}}). + + * Fix hyperlinks with formatting in DOC and DOCX ({{{http://issues.apache.org/jira/browse/TIKA-1255}TIKA-1255}}and {{{http://issues.apache.org/jira/browse/TIKA-2078}TIKA-2078}}) + + * Tika now is integrated with the Tensorflow library from Googleand it can use its Inception v3 image classification model toidentify objects in images ({{{http://issues.apache.org/jira/browse/TIKA-1993}TIKA-1993}}). + + * Parser configuration is now type-safe and parameters for parserscan have assigned types ({{{http://issues.apache.org/jira/browse/TIKA-1508}TIKA-1508}}, {{{http://issues.apache.org/jira/browse/TIKA-1986}TIKA-1986}}). + + * Prevent OOM/permanent hang on some corrupt CHM files ({{{http://issues.apache.org/jira/browse/TIKA-2040}TIKA-2040}}). + + * Upgrade ICU4J charset detection components to fix multithreadingbug ({{{http://issues.apache.org/jira/browse/TIKA-2041}TIKA-2041}}). + + * 1.4 ({{{http://issues.apache.org/jira/browse/TIKA-2039}TIKA-2039}}). + + * Maintain more significant digits in cells of "General" formatin XLS and XLSX ({{{http://issues.apache.org/jira/browse/TIKA-2025}TIKA-2025}}). + + * Avoid mark/reset issues when extracting or detecting embedded resourcesin RFC822 emails ({{{http://issues.apache.org/jira/browse/TIKA-2037}TIKA-2037}}). + + * Improving accuracy of Tesseract for better extraction of numericand alphanumeric text from images ({{{http://issues.apache.org/jira/browse/TIKA-2021}TIKA-2021}}, {{{http://issues.apache.org/jira/browse/TIKA-2031}TIKA-2031}}). + + * Improve extraction of embedded documents from PPT, PPTX and XLSX({{{http://issues.apache.org/jira/browse/TIKA-2026}TIKA-2026}}). + + * Add parser for applefile (AppleSingle) ({{{http://issues.apache.org/jira/browse/TIKA-2022}TIKA-2022}}). + + * Add mime types, mime magic and/or globs for: + + ** Endnote Import File ({{{http://issues.apache.org/jira/browse/TIKA-2011}TIKA-2011}}) + + ** DJVU files ({{{http://issues.apache.org/jira/browse/TIKA-2009}TIKA-2009}}) + + ** MS Owner File ({{{http://issues.apache.org/jira/browse/TIKA-2008}TIKA-2008}}) + + ** Windows Media Metafile ({{{http://issues.apache.org/jira/browse/TIKA-2004}TIKA-2004}}) + + ** iCal and vCalendar ({{{http://issues.apache.org/jira/browse/TIKA-2006}TIKA-2006}}) + + ** MBOX ({{{http://issues.apache.org/jira/browse/TIKA-2042}TIKA-2042}}) + + ** Stata DTA ({{{http://issues.apache.org/jira/browse/TIKA-2064}TIKA-2064}}) + + * Add configurable maximum threshold for number of events extractedfrom the XMP Media Management Schema in JempboxExtractor ({{{http://issues.apache.org/jira/browse/TIKA-1999}TIKA-1999}}). + + * Integrate TesseractOCR with full page image rendering for PDFs ({{{http://issues.apache.org/jira/browse/TIKA-1994}TIKA-1994}}). + + * Add mime detection via Nick C and parser for DBF files ({{{http://issues.apache.org/jira/browse/TIKA-1513}TIKA-1513}}). + + * Add mime detection and parsers for MSOffice 2003 XML Wordand Excel formats ({{{http://issues.apache.org/jira/browse/TIKA-1958}TIKA-1958}}). + + * Extract hyperlinks from PPT, PPTX, XSLX ({{{http://issues.apache.org/jira/browse/TIKA-1454}TIKA-1454}}). + + + The following people have contributed to Tika 1.14 by submitting or + commenting on the issues resolved in this release: + + * Aeham Abushwashi + + * Alan Hunter + + * Alexander Kazakov + + * Chris A. Mattmann + + * Chris Knott + + * Egbert + + * Eli Trucco + + * Eric Pugh + + * Jean Coudon + + * Jeff Swindle + + * John Dougrez-Lewis + + * John Haynes + + * Joseph Naegele + + * Josh Cummings + + * Ken Krugler + + * Kukushkin Alexander + + * Lewis John McGibbney + + * Luis Filipe Nassif + + * Matthias Pigulla + + * Nam-Quang Tran + + * Nilay Chheda + + * Philipp Steinkrueger + + * Sara Miller + + * Sebastian Iturra + + * Thamme Gowda + + * Tilman Hausherr + + * Tim Allison + + * Tim Barrett + + * Vjeran Marcinko + + * Yahav Amsalem + + * Zarana Parekh + + See {{https://s.apache.org/TRWa}} for more details on these contributions.