+1 OSX 10.9.3, Java 1.7
Tyler On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B. <[email protected]> wrote: > +1 > > Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7 > Windows 7, Java 1.7 > > I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 docs > (all formats) plus all available msoffice-x files in govdocs1, yielding > 10,413 docs. There were several improvements in text extraction for PDFs > (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf). > > There was one regression: > http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx > > Stacktrace: > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -369073454 > at java.lang.String.checkBounds(String.java:371) > at java.lang.String.<init>(String.java:415) > at > org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.java:114) > at > org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:163) > at > org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:91) > at > org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:63) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedOLE(AbstractOOXMLExtractor.java:250) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:115) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243) > > > -----Original Message----- > From: Mattmann, Chris A (3980) [mailto:[email protected]] > Sent: Monday, July 28, 2014 12:22 AM > To: [email protected] > Cc: [email protected] > Subject: [VOTE] Apache Tika 1.6 release candidate #1 > > Hi Folks, > > A candidate for the Tika 1.6 release is available at: > > http://people.apache.org/~mattmann/apache-tika-1.6/rc1/ > > > The release candidate is a zip archive of the sources in: > > http://svn.apache.org/repos/asf/tika/tags/1.6/ > > The SHA1 checksum of the archive is > 076ad343be56a540a4c8e395746fa4fda5b5b6d3. > > A Maven staging repository is available at: > > https://repository.apache.org/content/repositories/orgapachetika-1003/ > > > Please vote on releasing this package as Apache Tika 1.6. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.6 > [ ] -1 Do not release this package becauseŠ > > Thank you! > > Cheers, > Chris > > P.S. Here is my +1! > > > > > >
