tryma wrote:
Hi,I initially thought there was an issue with POI so I posted my initial question on the POI-user list. Actually, now I see this is happening in the Nutch classes for the MS parse plugin, not POI, so I'm giving this list a go. Here's a trace I get when I catch any exception occurring as I attempt to call the MSExcelParser's getParse(Content). It seems I get an NPE in MSBaseParser.getParse(). [#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;| java.lang.NullPointerException at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94) at org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40) at <my_package>.DocumentParser.parseDocument(DocumentParser.java:154) ... Looking at the source (MSBaseParser.java) at this line, it goes: ****SNIP**** extractor.extract(new ByteArrayInputStream(raw)); text = extractor.getText(); properties = extractor.getProperties(); outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(), getConf());} catch (Exception e) {return new ParseStatus(ParseStatus.FAILED, "Can't be handled as micrsosoft document. " + e) .getEmptyParse(this.conf); }// collect meta dataMetadata metadata = new Metadata(); title = properties.getProperty(DublinCore.TITLE); <========== This is line 94 as indicated in the trace properties.remove(DublinCore.TITLE); ****SNIP**** So I can only gather that my properties object is null. As seen above in the snippet from the MSBaseParser source, properties is initially null but assigned a value from the ExcelExtractor (properties = extractor.getProperties();) which I assume is becoming null. Any ideas how I can get around this or if I'm not setting some required properties? Btw, I've noticed a spelling mistake in the ParseStatus that is returned in the above lines of code; "Micrsosoft"
Fixed - thanks for reporting it. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
