Hi, I try with last releases nutch-2006-10-13.tar.gz and nutch-2006-10-19.tar.gz, but the NPE doesn't seem to be fixed, I always have the same exception message for a lot of document and a lot af format, excel but word and powerpoint too.....:
2006-10-19 16:41:09,265 WARN parse.ParseUtil - Unable to successfully parse content file://C:/docs_a_indexer/test.doc of type application/msword 2006-10-19 16:41:09,265 WARN fetcher.Fetcher - Error parsing: file:/C:/docs_a_indexer/test.doc: failed(2,0): Can't be handled as Microsoft document. org.apache.nutch.parse.msword.FastSavedException: Fast-saved files are unsupported at this time Couls you please help me because the volume of rejected document is large....... Thanks in advance, best regards, Aïcha Andrzej Bialecki wrote: > > tryma wrote: >> Hi Andrzej, >> >> Great that you've fixed the NPE, thanks! No prob with the spelling >> mistake, >> just wasn't sure what you'd fixed when you quoted my last message. ;) >> >> How do I get hold of this change, get the nightly build and use that? >> >> > > Yes, or use 'svn update' if you checked out your sources from SVN. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > > -- View this message in context: http://www.nabble.com/Problem-parsing-some-MS-Excel---other-formats-%28Office-2003%29-tf2408217.html#a6898319 Sent from the Nutch - Dev mailing list archive at Nabble.com.
