All,
On a related note, I did some digging on the one regression I found in the
pptx, and that will be solved if we wait for POI 3.11 beta 1. I haven't yet
had a chance to rerun on the random sample with the updated POI...
Best,
Tim
-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:[email protected]]
Sent: Thursday, July 31, 2014 2:30 PM
To: [email protected]
Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
Guys, based on all the comments here, I am going to roll another
RC #2 to address:
- Tyler's comment about getting the MicrosoftTranslator fix incorporated.
- Dave's Lingo24 API plugin for translate
- Nick's POI updates
I'll roll another RC #2 probably on Monday.
Thanks!
Cheers,
Chris
P.S. When I do, I'll diff trunk against the branch and then roll any
trunk updates post branch to 1.6 into the new 1.6 RC #2.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Mattmann>, Chris Mattmann <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, July 28, 2014 11:45 AM
To: "[email protected]" <[email protected]>
Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
>Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS
>thread for a few weeks about getting 1.6 out. Do you have a patch right
>now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2
>to get it in. If you don't have a patch yet, would you mind terribly if
>we pushed out 1.6, which already today has a ton of great updates, then
>shortly thereafter rolled a 1.7 (or did so when you finished with
>TIKA-1367)?
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Sergey Beryozkin <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Monday, July 28, 2014 11:38 AM
>To: "[email protected]" <[email protected]>
>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
>
>>+0 given that it appears that the tika-parsers dependencies
>>documentation issue has been pushed away. I'm getting confused why.
>>
>>Thanks. Sergey
>>
>>[1] https://issues.apache.org/jira/browse/TIKA-1367
>>
>>On 28/07/14 17:16, Tyler Palsulich wrote:
>>> +1
>>>
>>> OSX 10.9.3, Java 1.7
>>>
>>> Tyler
>>>
>>>
>>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B.
>>><[email protected]>
>>> wrote:
>>>
>>>> +1
>>>>
>>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
>>>> Windows 7, Java 1.7
>>>>
>>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000
>>>>docs
>>>> (all formats) plus all available msoffice-x files in govdocs1,
>>>>yielding
>>>> 10,413 docs. There were several improvements in text extraction for
>>>>PDFs
>>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf).
>>>>
>>>> There was one regression:
>>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx
>>>>
>>>> Stacktrace:
>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>>>of
>>>> range: -369073454
>>>> at java.lang.String.checkBounds(String.java:371)
>>>> at java.lang.String.<init>(String.java:415)
>>>> at
>>>>
>>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.java
>>>>:
>>>>114)
>>>> at
>>>>
>>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:163
>>>>)
>>>> at
>>>>
>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject
>>>>(
>>>>Ole10Native.java:91)
>>>> at
>>>>
>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject
>>>>(
>>>>Ole10Native.java:63)
>>>> at
>>>>
>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmb
>>>>e
>>>>ddedOLE(AbstractOOXMLExtractor.java:250)
>>>> at
>>>>
>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmb
>>>>e
>>>>ddedParts(AbstractOOXMLExtractor.java:199)
>>>> at
>>>>
>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(
>>>>A
>>>>bstractOOXMLExtractor.java:115)
>>>> at
>>>>
>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXM
>>>>L
>>>>ExtractorFactory.java:112)
>>>> at
>>>>
>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.ja
>>>>v
>>>>a:82)
>>>> at
>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Mattmann, Chris A (3980) [mailto:[email protected]]
>>>> Sent: Monday, July 28, 2014 12:22 AM
>>>> To: [email protected]
>>>> Cc: [email protected]
>>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1
>>>>
>>>> Hi Folks,
>>>>
>>>> A candidate for the Tika 1.6 release is available at:
>>>>
>>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/
>>>>
>>>>
>>>> The release candidate is a zip archive of the sources in:
>>>>
>>>> http://svn.apache.org/repos/asf/tika/tags/1.6/
>>>>
>>>> The SHA1 checksum of the archive is
>>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3.
>>>>
>>>> A Maven staging repository is available at:
>>>>
>>>> https://repository.apache.org/content/repositories/orgapachetika-1003/
>>>>
>>>>
>>>> Please vote on releasing this package as Apache Tika 1.6.
>>>> The vote is open for the next 72 hours and passes if a majority of at
>>>> least three +1 Tika PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Tika 1.6
>>>> [ ] -1 Do not release this package becauseŠ
>>>>
>>>> Thank you!
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> P.S. Here is my +1!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>