Problem during parsing msword document . It fetching properly but parsing is 
not working. Please show me the way how can i parse it
-----------------------------------------------------------------------------------------------------------------------------------

         Key: NUTCH-157
         URL: http://issues.apache.org/jira/browse/NUTCH-157
     Project: Nutch
        Type: Bug
    Versions: 0.7    
 Environment: windows 
    Reporter: karamjit


Ms word document  not parsing.

Error messages :----------

Page from url Path in fetch 
====file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
060301 173204 fetching  
file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc
060301 173204 Parsing 
[file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL 
PROTECTED]
060301 173204 fetch of 
file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc failed with: 
java.lang.NoSuchMethodError: 
org.apache.poi.hpsf.SummaryInformation.getEditTime()J
060301 173204 Could not clean the content-type [], Reason is 
[org.apache.nutch.util.mime.MimeTypeException: The type can not be null or 
empty]. Using its raw version...
060301 173204 Parsing 
[file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL 
PROTECTED]
060301 173205 status: segment 20060301173203, 1 pages, 1 errors, 35840 bytes, 
1000 ms
060301 173205 status: 1.0 pages/s, 280.0 kb/s, 35840.0 bytes/page


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to