Fixed the NPE issue too, or just the spelling mistake?


Best,

Trym


Andrzej Bialecki wrote:
> 
> tryma wrote:
>> Hi,
>>
>> I initially thought there was an issue with POI so I posted my initial
>> question on the POI-user list.
>> Actually, now I see this is happening in the Nutch classes for the MS
>> parse
>> plugin, not POI, so I'm giving this list a go.
>>
>> Here's a trace I get when I catch any exception occurring as I attempt to
>> call the MSExcelParser's getParse(Content). It seems I get an NPE in
>> MSBaseParser.getParse().
>>
>> [#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;|
>> java.lang.NullPointerException
>>      at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94)
>>      at
>> org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40)
>>         at
>> <my_package>.DocumentParser.parseDocument(DocumentParser.java:154)
>>         ...
>>
>> Looking at the source (MSBaseParser.java) at this line, it goes:
>>
>> ****SNIP****
>>       extractor.extract(new ByteArrayInputStream(raw));
>>       text = extractor.getText();
>>       properties = extractor.getProperties();
>>       outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(),
>> getConf());
>>       
>>     } catch (Exception e) {
>>       return new ParseStatus(ParseStatus.FAILED,
>>                              "Can't be handled as micrsosoft document. "
>> +
>> e)
>>                              .getEmptyParse(this.conf);
>>     }
>>     
>>     // collect meta data
>>     Metadata metadata = new Metadata();
>>     title = properties.getProperty(DublinCore.TITLE);      <==========
>> This
>> is line 94 as indicated in the trace
>>     properties.remove(DublinCore.TITLE);
>> ****SNIP****
>>
>> So I can only gather that my properties object is null. As seen above in
>> the
>> snippet from the MSBaseParser source, properties is initially null but
>> assigned a value from the ExcelExtractor (properties =
>> extractor.getProperties();) which I assume is becoming null.
>>
>> Any ideas how I can get around this or if I'm not setting some required
>> properties?
>>
>> Btw, I've noticed a spelling mistake in the ParseStatus that is returned
>> in
>> the above lines of code; "Micrsosoft"
>>
>>   
> 
> Fixed - thanks for reporting it.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-parsing-some-MS-Excel---other-formats-%28Office-2003%29-tf2408217.html#a6713006
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to