Hi,

I try with last releases nutch-2006-10-13.tar.gz and
nutch-2006-10-19.tar.gz,
but the NPE doesn't seem to be fixed, I always have the same exception
message for a lot of document and a lot af format, excel but word and
powerpoint too.....:

2006-10-19 16:41:09,265 WARN  parse.ParseUtil - Unable to successfully parse
content file://C:/docs_a_indexer/test.doc of type application/msword
2006-10-19 16:41:09,265 WARN  fetcher.Fetcher - Error parsing:
file:/C:/docs_a_indexer/test.doc: failed(2,0): Can't be handled as Microsoft
document. org.apache.nutch.parse.msword.FastSavedException: Fast-saved files
are unsupported at this time

Couls you please help me because the volume of rejected document is
large.......

Thanks in advance,
best regards,
Aïcha



Andrzej Bialecki wrote:
> 
> tryma wrote:
>> Hi Andrzej,
>>
>> Great that you've fixed the NPE, thanks! No prob with the spelling
>> mistake,
>> just wasn't sure what you'd fixed when you quoted my last message. ;)
>>
>> How do I get hold of this change, get the nightly build and use that?
>>
>>   
> 
> Yes, or use 'svn update' if you checked out your sources from SVN.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-parsing-some-MS-Excel---other-formats-%28Office-2003%29-tf2408217.html#a6898319
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to