WordExtractor throws java.util.NoSuchElementException on some documents
-----------------------------------------------------------------------
Key: NUTCH-326
URL: http://issues.apache.org/jira/browse/NUTCH-326
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 0.7.2, 0.7.1
Reporter: Tom Jensen
Priority: Minor
At line 156 in org.apache.nutch.parse.msword.WordExtractor it will on occassion
throw a java.util.NoSuchElementException because there is no checking as to
whether or not the Iterator has been exhausted. Suggest adding this:
if (!textIt.hasNext()) {
break;
}
just before line 156. Tested with problem word documents. Results were
Exceptions no longer being thrown and text extracted successfully. Other
documents that successfully had their text extracted previously continued to do
so.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira