Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS
thread for a few weeks about getting 1.6 out. Do you have a patch right
now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2
to get it in. If you don't have a patch yet, would you mind terribly if
we pushed out 1.6, which already today has a ton of great updates, then
shortly thereafter rolled a 1.7 (or did so when you finished with
TIKA-1367)?

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sergey Beryozkin <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, July 28, 2014 11:38 AM
To: "[email protected]" <[email protected]>
Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1

>+0 given that it appears that the tika-parsers dependencies
>documentation issue has been pushed away. I'm getting confused why.
>
>Thanks. Sergey
>
>[1] https://issues.apache.org/jira/browse/TIKA-1367
>
>On 28/07/14 17:16, Tyler Palsulich wrote:
>> +1
>>
>> OSX 10.9.3, Java 1.7
>>
>> Tyler
>>
>>
>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B.
>><[email protected]>
>> wrote:
>>
>>> +1
>>>
>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
>>> Windows 7, Java 1.7
>>>
>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000
>>>docs
>>> (all formats) plus all available msoffice-x files in govdocs1, yielding
>>> 10,413 docs.  There were several improvements in text extraction for
>>>PDFs
>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf).
>>>
>>> There was one regression:
>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx
>>>
>>> Stacktrace:
>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>>of
>>> range: -369073454
>>>          at java.lang.String.checkBounds(String.java:371)
>>>          at java.lang.String.<init>(String.java:415)
>>>          at
>>> 
>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.java:
>>>114)
>>>          at
>>> 
>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:163)
>>>          at
>>> 
>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(
>>>Ole10Native.java:91)
>>>          at
>>> 
>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(
>>>Ole10Native.java:63)
>>>          at
>>> 
>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbe
>>>ddedOLE(AbstractOOXMLExtractor.java:250)
>>>          at
>>> 
>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbe
>>>ddedParts(AbstractOOXMLExtractor.java:199)
>>>          at
>>> 
>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(A
>>>bstractOOXMLExtractor.java:115)
>>>          at
>>> 
>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXML
>>>ExtractorFactory.java:112)
>>>          at
>>> 
>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.jav
>>>a:82)
>>>          at
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
>>>
>>>
>>> -----Original Message-----
>>> From: Mattmann, Chris A (3980) [mailto:[email protected]]
>>> Sent: Monday, July 28, 2014 12:22 AM
>>> To: [email protected]
>>> Cc: [email protected]
>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1
>>>
>>> Hi Folks,
>>>
>>> A candidate for the Tika 1.6 release is available at:
>>>
>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/
>>>
>>>
>>> The release candidate is a zip archive of the sources in:
>>>
>>>      http://svn.apache.org/repos/asf/tika/tags/1.6/
>>>
>>> The SHA1 checksum of the archive is
>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3.
>>>
>>> A Maven staging repository is available at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachetika-1003/
>>>
>>>
>>> Please vote on releasing this package as Apache Tika 1.6.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Tika PMC votes are cast.
>>>
>>>      [ ] +1 Release this package as Apache Tika 1.6
>>>      [ ] -1 Do not release this package becauseŠ
>>>
>>> Thank you!
>>>
>>> Cheers,
>>> Chris
>>>
>>> P.S. Here is my +1!
>>>
>>>
>>>
>>>
>>>
>>>
>>

Reply via email to