+1

OSX 10.9.3, Java 1.7

Tyler


On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B. <[email protected]>
wrote:

> +1
>
> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
> Windows 7, Java 1.7
>
> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 docs
> (all formats) plus all available msoffice-x files in govdocs1, yielding
> 10,413 docs.  There were several improvements in text extraction for PDFs
> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf).
>
> There was one regression:
> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx
>
> Stacktrace:
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -369073454
>         at java.lang.String.checkBounds(String.java:371)
>         at java.lang.String.<init>(String.java:415)
>         at
> org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.java:114)
>         at
> org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:163)
>         at
> org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:91)
>         at
> org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:63)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedOLE(AbstractOOXMLExtractor.java:250)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:115)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
>
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:[email protected]]
> Sent: Monday, July 28, 2014 12:22 AM
> To: [email protected]
> Cc: [email protected]
> Subject: [VOTE] Apache Tika 1.6 release candidate #1
>
> Hi Folks,
>
> A candidate for the Tika 1.6 release is available at:
>
> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/
>
>
> The release candidate is a zip archive of the sources in:
>
>     http://svn.apache.org/repos/asf/tika/tags/1.6/
>
> The SHA1 checksum of the archive is
> 076ad343be56a540a4c8e395746fa4fda5b5b6d3.
>
> A Maven staging repository is available at:
>
> https://repository.apache.org/content/repositories/orgapachetika-1003/
>
>
> Please vote on releasing this package as Apache Tika 1.6.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>     [ ] +1 Release this package as Apache Tika 1.6
>     [ ] -1 Do not release this package becauseŠ
>
> Thank you!
>
> Cheers,
> Chris
>
> P.S. Here is my +1!
>
>
>
>
>
>

Reply via email to