[ 
https://issues.apache.org/jira/browse/NUTCH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  closed NUTCH-157.
-----------------------------------

    Resolution: Won't Fix

> Problem during parsing msword document . It fetching properly but parsing is 
> not working. Please show me the way how can i parse it
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-157
>                 URL: https://issues.apache.org/jira/browse/NUTCH-157
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.7
>         Environment: windows 
>            Reporter: karamjit
>
> Ms word document  not parsing.
> Error messages :----------
> Page from url Path in fetch 
> ====file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
> 060301 173204 fetching  
> file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
> 060301 173204 Parsing 
> [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL 
> PROTECTED]
> 060301 173204 fetch of 
> file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc failed with: 
> java.lang.NoSuchMethodError: 
> org.apache.poi.hpsf.SummaryInformation.getEditTime()J
> 060301 173204 Could not clean the content-type [], Reason is 
> [org.apache.nutch.util.mime.MimeTypeException: The type can not be null or 
> empty]. Using its raw version...
> 060301 173204 Parsing 
> [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL 
> PROTECTED]
> 060301 173205 status: segment 20060301173203, 1 pages, 1 errors, 35840 bytes, 
> 1000 ms
> 060301 173205 status: 1.0 pages/s, 280.0 kb/s, 35840.0 bytes/page

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to