OK RC #2 coming up shortly, just brought the branch up to date in
r1621623. Also cleaned up JIRA.

Here goes..

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Mattmann>, Chris Mattmann <[email protected]>
Date: Thursday, July 31, 2014 11:29 AM
To: "[email protected]" <[email protected]>
Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1

>Guys, based on all the comments here, I am going to roll another
>RC #2 to address:
>
>- Tyler's comment about getting the MicrosoftTranslator fix incorporated.
>- Dave's Lingo24 API plugin for translate
>- Nick's POI updates
>
>I'll roll another RC #2 probably on Monday.
>
>Thanks!
>
>Cheers,
>Chris
>
>P.S. When I do, I'll diff trunk against the branch and then roll any
>trunk updates post branch to 1.6 into the new 1.6 RC #2.
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: <Mattmann>, Chris Mattmann <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Monday, July 28, 2014 11:45 AM
>To: "[email protected]" <[email protected]>
>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
>
>>Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS
>>thread for a few weeks about getting 1.6 out. Do you have a patch right
>>now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2
>>to get it in. If you don't have a patch yet, would you mind terribly if
>>we pushed out 1.6, which already today has a ton of great updates, then
>>shortly thereafter rolled a 1.7 (or did so when you finished with
>>TIKA-1367)?
>>
>>Cheers,
>>Chris
>>
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398)
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: [email protected]
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Sergey Beryozkin <[email protected]>
>>Reply-To: "[email protected]" <[email protected]>
>>Date: Monday, July 28, 2014 11:38 AM
>>To: "[email protected]" <[email protected]>
>>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1
>>
>>>+0 given that it appears that the tika-parsers dependencies
>>>documentation issue has been pushed away. I'm getting confused why.
>>>
>>>Thanks. Sergey
>>>
>>>[1] https://issues.apache.org/jira/browse/TIKA-1367
>>>
>>>On 28/07/14 17:16, Tyler Palsulich wrote:
>>>> +1
>>>>
>>>> OSX 10.9.3, Java 1.7
>>>>
>>>> Tyler
>>>>
>>>>
>>>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B.
>>>><[email protected]>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
>>>>> Windows 7, Java 1.7
>>>>>
>>>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000
>>>>>docs
>>>>> (all formats) plus all available msoffice-x files in govdocs1,
>>>>>yielding
>>>>> 10,413 docs.  There were several improvements in text extraction for
>>>>>PDFs
>>>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf).
>>>>>
>>>>> There was one regression:
>>>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx
>>>>>
>>>>> Stacktrace:
>>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index
>>>>>out
>>>>>of
>>>>> range: -369073454
>>>>>          at java.lang.String.checkBounds(String.java:371)
>>>>>          at java.lang.String.<init>(String.java:415)
>>>>>          at
>>>>> 
>>>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.jav
>>>>>a
>>>>>:
>>>>>114)
>>>>>          at
>>>>> 
>>>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:16
>>>>>3
>>>>>)
>>>>>          at
>>>>> 
>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObjec
>>>>>t
>>>>>(
>>>>>Ole10Native.java:91)
>>>>>          at
>>>>> 
>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObjec
>>>>>t
>>>>>(
>>>>>Ole10Native.java:63)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEm
>>>>>b
>>>>>e
>>>>>ddedOLE(AbstractOOXMLExtractor.java:250)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEm
>>>>>b
>>>>>e
>>>>>ddedParts(AbstractOOXMLExtractor.java:199)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML
>>>>>(
>>>>>A
>>>>>bstractOOXMLExtractor.java:115)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOX
>>>>>M
>>>>>L
>>>>>ExtractorFactory.java:112)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.j
>>>>>a
>>>>>v
>>>>>a:82)
>>>>>          at
>>>>> 
>>>>>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mattmann, Chris A (3980) [mailto:[email protected]]
>>>>> Sent: Monday, July 28, 2014 12:22 AM
>>>>> To: [email protected]
>>>>> Cc: [email protected]
>>>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1
>>>>>
>>>>> Hi Folks,
>>>>>
>>>>> A candidate for the Tika 1.6 release is available at:
>>>>>
>>>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/
>>>>>
>>>>>
>>>>> The release candidate is a zip archive of the sources in:
>>>>>
>>>>>      http://svn.apache.org/repos/asf/tika/tags/1.6/
>>>>>
>>>>> The SHA1 checksum of the archive is
>>>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3.
>>>>>
>>>>> A Maven staging repository is available at:
>>>>>
>>>>> 
>>>>>https://repository.apache.org/content/repositories/orgapachetika-1003/
>>>>>
>>>>>
>>>>> Please vote on releasing this package as Apache Tika 1.6.
>>>>> The vote is open for the next 72 hours and passes if a majority of at
>>>>> least three +1 Tika PMC votes are cast.
>>>>>
>>>>>      [ ] +1 Release this package as Apache Tika 1.6
>>>>>      [ ] -1 Do not release this package becauseŠ
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> P.S. Here is my +1!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Reply via email to