[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992440#comment-12992440
]
Julien Nioche commented on NUTCH-965:
-------------------------------------
this should be optional but activated by default
the parsing is also done within the fetching so it would need modifying there
as well
would be nice to have that in 1.3
note : change the title to something like "skip parsing for truncated
documents" would be more accurate description
> Parsing takes up 100% CPU
> -------------------------
>
> Key: NUTCH-965
> URL: https://issues.apache.org/jira/browse/NUTCH-965
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Reporter: Alexis
> Attachments: parserJob.patch
>
>
> The issue you're likely to run into when parsing truncated FLV files is
> described here:
> http://www.mail-archive.com/[email protected]/msg01880.html
> The parser library gets stuck in infinite loop as it encounters corrupted
> data due to for example truncating big binary files at fetch time.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira