> From: Hong-Thai Nguyen > Sent: September 11, 2014 1:40:08pm PDT > To: dev@tika.apache.org > Subject: Re: NPE on all *.odt, odp, .ods documents > > I was wrong when saying that All OpenDocument are failed, some files > passed, but alot of them failed with NPE in OpenDocumentParser line 161.
OK, thanks for clarifying. So I assume we now have a unit test that would fail without the fix, yes? Thanks, -- Ken > > I'm looking to OpenDocumentParser.java on 1.6. The bug comes from block > lines 126-130 when input is TikaInputStream (our case): > if (container instanceof ZipFile) { > zipFile = (ZipFile) container; > } else if (tis.hasFile()) { > zipFile = new ZipFile(tis.getFile()); > } > > zipFile is sometimes never created. > > > For information, this bug is really fixed in 1.7-SNAPSHOT. Here's the > detail of comparison on two versions on same corpus: > 1.6: > 14-09-09 16:17:43 INFO (DocumentConversionErrorPlugin.java : 115) [pool-2 > -thread-2] Summary of document conversion errors: > - pdf (7) > - pptx (10) > - doc (6) > - ppt (14) > - xls (9) > - dwg (4) > - odp (495) > - odt (839) > - pps (2) > - ods (1) > > 1.7-SNASPHOT: > - pdf (7) - pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp (2) - > pps (2) > > > On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler <kkrugler_li...@transpac.com> > wrote: > >> >>> From: Hong-Thai Nguyen >>> Sent: September 11, 2014 5:21:41am PDT >>> To: dev@tika.apache.org >>> Subject: NPE on all *.odt, odp, .ods documents >>> >>> Hi all, >>> >>> I've tested the conversion Tika 1.6 with our corpus, all OpenOffice >>> document types are failed with NPE. Fix has been done on >>> https://issues.apache.org/jira/browse/TIKA-1412, but available from 1.7. >>> That's a fatal error for me. >> >> I'm curious - don't we have unit tests for OpenOffice document types? >> >> If so, then why are they passing, but all docs tried by Hong-Thai fail? >> >> -- Ken >> >>> >>> Should we release a 1.6.1 with the fix of TIKA-1412 ? >>> >>> Tack trace: >>> Caused by: com.polyspot.document.converter.ConversionException: >>> org.apache.tika.exception.TikaException: Unexpected RuntimeException from >>> org.apache.tika.parser.ParserDecorator$1@318e5904 >>> at >>> >> com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:233) >>> at >>> >> com.polyspot.document.converter.DocumentConverter.convert(DocumentConverter.java:127) >>> at >>> >> com.polyspot.wscrawlers.PsDocConverter.getConvertedDocument(PsDocConverter.java:83) >>> ... 22 more >>> Caused by: org.apache.tika.exception.TikaException: Unexpected >>> RuntimeException from org.apache.tika.parser.ParserDecorator$1@318e5904 >>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:246) >>> at >>> >> com.polyspot.document.converter.DocumentConverter.realizeTikaConversion(DocumentConverter.java:225) >>> ... 24 more >>> Caused by: java.lang.NullPointerException >>> at >>> >> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:161) >>> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) >>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) >>> ... 25 more >>> >>> -- >>> -------------- >>> Hong-Thai -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr