Thank you Sergey! OK I will proceed. THanks for your contributions to Tika and yes we'll get there
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Sergey Beryozkin <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, July 28, 2014 3:16 PM To: "[email protected]" <[email protected]> Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 >Hi Chris, > >This is not an issue that should block the release, I was careful not to >vote with a minus one. I've become a bit impatient, but no one really >blocks me from completing this pure documentation effort myself, I was >hoping that someone would do it first :-). > >Please go ahead with the release as planned, thanks for offering the >chance to delay the release, but I can not go for it, we'll get there as >far as the documentation is concerned :-) > >Thanks, Sergey > >On 28/07/14 21:45, Mattmann, Chris A (3980) wrote: >> Thanks Sergey - I pushed to 1.7 since we have been having a DISCUSS >> thread for a few weeks about getting 1.6 out. Do you have a patch right >> now for TIKA-1367? If so I'm happy to incorporate it and roll an RC #2 >> to get it in. If you don't have a patch yet, would you mind terribly if >> we pushed out 1.6, which already today has a ton of great updates, then >> shortly thereafter rolled a 1.7 (or did so when you finished with >> TIKA-1367)? >> >> Cheers, >> Chris >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> -----Original Message----- >> From: Sergey Beryozkin <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Monday, July 28, 2014 11:38 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: [VOTE] Apache Tika 1.6 release candidate #1 >> >>> +0 given that it appears that the tika-parsers dependencies >>> documentation issue has been pushed away. I'm getting confused why. >>> >>> Thanks. Sergey >>> >>> [1] https://issues.apache.org/jira/browse/TIKA-1367 >>> >>> On 28/07/14 17:16, Tyler Palsulich wrote: >>>> +1 >>>> >>>> OSX 10.9.3, Java 1.7 >>>> >>>> Tyler >>>> >>>> >>>> On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B. >>>> <[email protected]> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7 >>>>> Windows 7, Java 1.7 >>>>> >>>>> I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 >>>>> docs >>>>> (all formats) plus all available msoffice-x files in govdocs1, >>>>>yielding >>>>> 10,413 docs. There were several improvements in text extraction for >>>>> PDFs >>>>> (mostly spacing) and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf). >>>>> >>>>> There was one regression: >>>>> http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx >>>>> >>>>> Stacktrace: >>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index >>>>>out >>>>> of >>>>> range: -369073454 >>>>> at java.lang.String.checkBounds(String.java:371) >>>>> at java.lang.String.<init>(String.java:415) >>>>> at >>>>> >>>>> >>>>>org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.jav >>>>>a: >>>>> 114) >>>>> at >>>>> >>>>> >>>>>org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:16 >>>>>3) >>>>> at >>>>> >>>>> >>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObjec >>>>>t( >>>>> Ole10Native.java:91) >>>>> at >>>>> >>>>> >>>>>org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObjec >>>>>t( >>>>> Ole10Native.java:63) >>>>> at >>>>> >>>>> >>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEm >>>>>be >>>>> ddedOLE(AbstractOOXMLExtractor.java:250) >>>>> at >>>>> >>>>> >>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEm >>>>>be >>>>> ddedParts(AbstractOOXMLExtractor.java:199) >>>>> at >>>>> >>>>> >>>>>org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML >>>>>(A >>>>> bstractOOXMLExtractor.java:115) >>>>> at >>>>> >>>>> >>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOX >>>>>ML >>>>> ExtractorFactory.java:112) >>>>> at >>>>> >>>>> >>>>>org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.j >>>>>av >>>>> a:82) >>>>> at >>>>> >>>>>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243) >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Mattmann, Chris A (3980) [mailto:[email protected]] >>>>> Sent: Monday, July 28, 2014 12:22 AM >>>>> To: [email protected] >>>>> Cc: [email protected] >>>>> Subject: [VOTE] Apache Tika 1.6 release candidate #1 >>>>> >>>>> Hi Folks, >>>>> >>>>> A candidate for the Tika 1.6 release is available at: >>>>> >>>>> http://people.apache.org/~mattmann/apache-tika-1.6/rc1/ >>>>> >>>>> >>>>> The release candidate is a zip archive of the sources in: >>>>> >>>>> http://svn.apache.org/repos/asf/tika/tags/1.6/ >>>>> >>>>> The SHA1 checksum of the archive is >>>>> 076ad343be56a540a4c8e395746fa4fda5b5b6d3. >>>>> >>>>> A Maven staging repository is available at: >>>>> >>>>> >>>>>https://repository.apache.org/content/repositories/orgapachetika-1003/ >>>>> >>>>> >>>>> Please vote on releasing this package as Apache Tika 1.6. >>>>> The vote is open for the next 72 hours and passes if a majority of at >>>>> least three +1 Tika PMC votes are cast. >>>>> >>>>> [ ] +1 Release this package as Apache Tika 1.6 >>>>> [ ] -1 Do not release this package becauseŠ >>>>> >>>>> Thank you! >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> P.S. Here is my +1! >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >> >
