Can we get TIKA-1404 in 1.6? Simple, but significant, fix. Tyler On Aug 31, 2014 3:54 PM, "Mattmann, Chris A (3980)" < [email protected]> wrote:
> Ugh, sorry. Maven release plugin issues, going to have to clean some > stuff up here. Don't mind me folks. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: <Mattmann>, Chris Mattmann <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Sunday, August 31, 2014 12:37 PM > To: "[email protected]" <[email protected]> > Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 > > >OK RC #2 coming up shortly, just brought the branch up to date in > >r1621623. Also cleaned up JIRA. > > > >Here goes.. > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Chris Mattmann, Ph.D. > >Chief Architect > >Instrument Software and Science Data Systems Section (398) > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >Office: 168-519, Mailstop: 168-527 > >Email: [email protected] > >WWW: http://sunset.usc.edu/~mattmann/ > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Adjunct Associate Professor, Computer Science Department > >University of Southern California, Los Angeles, CA 90089 USA > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > > >-----Original Message----- > >From: <Mattmann>, Chris Mattmann <[email protected]> > >Date: Thursday, July 31, 2014 11:29 AM > >To: "[email protected]" <[email protected]> > >Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 > > > >>Guys, based on all the comments here, I am going to roll another > >>RC #2 to address: > >> > >>- Tyler's comment about getting the MicrosoftTranslator fix incorporated. > >>- Dave's Lingo24 API plugin for translate > >>- Nick's POI updates > >> > >>I'll roll another RC #2 probably on Monday. > >> > >>Thanks! > >> > >>Cheers, > >>Chris > >> > >>P.S. When I do, I'll diff trunk against the branch and then roll any > >>trunk updates post branch to 1.6 into the new 1.6 RC #2. > >> > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>Chris Mattmann, Ph.D. > >>Chief Architect > >>Instrument Software and Science Data Systems Section (398) > >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>Office: 168-519, Mailstop: 168-527 > >>Email: [email protected] > >>WWW: http://sunset.usc.edu/~mattmann/ > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>Adjunct Associate Professor, Computer Science Department > >>University of Southern California, Los Angeles, CA 90089 USA > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> > >>-----Original Message----- > >>From: <Mattmann>, Chris Mattmann <[email protected]> > >>Reply-To: "[email protected]" <[email protected]> > >>Date: Monday, July 28, 2014 11:45 AM > >>To: "[email protected]" <[email protected]> > >>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 > >> > >>>Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS > >>>thread for a few weeks about getting 1.6 out. Do you have a patch right > >>>now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2 > >>>to get it in. If you don't have a patch yet, would you mind terribly if > >>>we pushed out 1.6, which already today has a ton of great updates, then > >>>shortly thereafter rolled a 1.7 (or did so when you finished with > >>>TIKA-1367)? > >>> > >>>Cheers, > >>>Chris > >>> > >>> > >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>Chris Mattmann, Ph.D. > >>>Chief Architect > >>>Instrument Software and Science Data Systems Section (398) > >>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>>Office: 168-519, Mailstop: 168-527 > >>>Email: [email protected] > >>>WWW: http://sunset.usc.edu/~mattmann/ > >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>Adjunct Associate Professor, Computer Science Department > >>>University of Southern California, Los Angeles, CA 90089 USA > >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> > >>> > >>> > >>> > >>>-----Original Message----- > >>>From: Sergey Beryozkin <[email protected]> > >>>Reply-To: "[email protected]" <[email protected]> > >>>Date: Monday, July 28, 2014 11:38 AM > >>>To: "[email protected]" <[email protected]> > >>>Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 > >>> > >>>>+0 given that it appears that the tika-parsers dependencies > >>>>documentation issue has been pushed away. I'm getting confused why. > >>>> > >>>>Thanks. Sergey > >>>> > >>>>[1] https://issues.apache.org/jira/browse/TIKA-1367 > >>>> > >>>>On 28/07/14 17:16, Tyler Palsulich wrote: > >>>>> +1 > >>>>> > >>>>> OSX 10.9.3, Java 1.7 > >>>>> > >>>>> Tyler > >>>>> > >>>>> > >>>>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B. > >>>>><[email protected]> > >>>>> wrote: > >>>>> > >>>>>> +1 > >>>>>> > >>>>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7 > >>>>>> Windows 7, Java 1.7 > >>>>>> > >>>>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 > >>>>>>docs > >>>>>> (all formats) plus all available msoffice-x files in govdocs1, > >>>>>>yielding > >>>>>> 10,413 docs. There were several improvements in text extraction for > >>>>>>PDFs > >>>>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf). > >>>>>> > >>>>>> There was one regression: > >>>>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx > >>>>>> > >>>>>> Stacktrace: > >>>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index > >>>>>>out > >>>>>>of > >>>>>> range: -369073454 > >>>>>> at java.lang.String.checkBounds(String.java:371) > >>>>>> at java.lang.String.<init>(String.java:415) > >>>>>> at > >>>>>> > >>>>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.ja > >>>>>>v > >>>>>>a > >>>>>>: > >>>>>>114) > >>>>>> at > >>>>>> > >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:1 > >>>>>>6 > >>>>>>3 > >>>>>>) > >>>>>> at > >>>>>> > >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObje > >>>>>>c > >>>>>>t > >>>>>>( > >>>>>>Ole10Native.java:91) > >>>>>> at > >>>>>> > >>>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObje > >>>>>>c > >>>>>>t > >>>>>>( > >>>>>>Ole10Native.java:63) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleE > >>>>>>m > >>>>>>b > >>>>>>e > >>>>>>ddedOLE(AbstractOOXMLExtractor.java:250) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleE > >>>>>>m > >>>>>>b > >>>>>>e > >>>>>>ddedParts(AbstractOOXMLExtractor.java:199) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTM > >>>>>>L > >>>>>>( > >>>>>>A > >>>>>>bstractOOXMLExtractor.java:115) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OO > >>>>>>X > >>>>>>M > >>>>>>L > >>>>>>ExtractorFactory.java:112) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser. > >>>>>>j > >>>>>>a > >>>>>>v > >>>>>>a:82) > >>>>>> at > >>>>>> > >>>>>>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243 > >>>>>>) > >>>>>> > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Mattmann, Chris A (3980) > >>>>>>[mailto:[email protected]] > >>>>>> Sent: Monday, July 28, 2014 12:22 AM > >>>>>> To: [email protected] > >>>>>> Cc: [email protected] > >>>>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1 > >>>>>> > >>>>>> Hi Folks, > >>>>>> > >>>>>> A candidate for the Tika 1.6 release is available at: > >>>>>> > >>>>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/ > >>>>>> > >>>>>> > >>>>>> The release candidate is a zip archive of the sources in: > >>>>>> > >>>>>> http://svn.apache.org/repos/asf/tika/tags/1.6/ > >>>>>> > >>>>>> The SHA1 checksum of the archive is > >>>>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3. > >>>>>> > >>>>>> A Maven staging repository is available at: > >>>>>> > >>>>>> > >>>>>> > https://repository.apache.org/content/repositories/orgapachetika-1003 > >>>>>>/ > >>>>>> > >>>>>> > >>>>>> Please vote on releasing this package as Apache Tika 1.6. > >>>>>> The vote is open for the next 72 hours and passes if a majority of > >>>>>>at > >>>>>> least three +1 Tika PMC votes are cast. > >>>>>> > >>>>>> [ ] +1 Release this package as Apache Tika 1.6 > >>>>>> [ ] -1 Do not release this package becauseŠ > >>>>>> > >>>>>> Thank you! > >>>>>> > >>>>>> Cheers, > >>>>>> Chris > >>>>>> > >>>>>> P.S. Here is my +1! > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>> > >> > > > >
