WordExtractor throws java.util.NoSuchElementException on some documents
-----------------------------------------------------------------------

                 Key: NUTCH-326
                 URL: http://issues.apache.org/jira/browse/NUTCH-326
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 0.7.2, 0.7.1
            Reporter: Tom Jensen
            Priority: Minor


At line 156 in org.apache.nutch.parse.msword.WordExtractor it will on occassion 
throw a java.util.NoSuchElementException because there is no checking as to 
whether or not the Iterator has been exhausted.  Suggest adding this:

        if (!textIt.hasNext()) {
                break;
        }

just before line 156.  Tested with problem word documents.  Results were 
Exceptions no longer being thrown and text extracted successfully.  Other 
documents that successfully had their text extracted previously continued to do 
so.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to