[ https://issues.apache.org/jira/browse/NUTCH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki closed NUTCH-157. ----------------------------------- Resolution: Won't Fix > Problem during parsing msword document . It fetching properly but parsing is > not working. Please show me the way how can i parse it > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: NUTCH-157 > URL: https://issues.apache.org/jira/browse/NUTCH-157 > Project: Nutch > Issue Type: Bug > Affects Versions: 0.7 > Environment: windows > Reporter: karamjit > > Ms word document not parsing. > Error messages :---------- > Page from url Path in fetch > ====file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc > 060301 173204 fetching > file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc > 060301 173204 Parsing > [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL > PROTECTED] > 060301 173204 fetch of > file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc failed with: > java.lang.NoSuchMethodError: > org.apache.poi.hpsf.SummaryInformation.getEditTime()J > 060301 173204 Could not clean the content-type [], Reason is > [org.apache.nutch.util.mime.MimeTypeException: The type can not be null or > empty]. Using its raw version... > 060301 173204 Parsing > [file:/D:/karam/Atlantis_Tools/Crawl_Files/compareFVAJ.doc] with [EMAIL > PROTECTED] > 060301 173205 status: segment 20060301173203, 1 pages, 1 errors, 35840 bytes, > 1000 ms > 060301 173205 status: 1.0 pages/s, 280.0 kb/s, 35840.0 bytes/page -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.