Arkadi Kosmynin created NUTCH-1993:
--------------------------------------
Summary: Nutch does not use backup parsers
Key: NUTCH-1993
URL: https://issues.apache.org/jira/browse/NUTCH-1993
Project: Nutch
Issue Type: Bug
Components: parser
Reporter: Arkadi Kosmynin
>From reading the code it is clear that it is designed to allow using several
>parsers to parse a document in a sequence, until it is successfully parsed. In
>practice, this does not work because these lines
if (parseResult != null && !parseResult.isEmpty())
return parseResult;
break the loop even if the parsing has failed because parseResult is not empty
anyway, it contains a ParseData with ParseStatus.FAILED.
A fix:
if ( parseResult.isAnySuccess() )
return parseResult;
Where parseResult.isAnySuccess() returns true if any of the parsing attempts
were successful.
This fix is important because it allows use of backup parsers as originally
designed and thus increase index completeness.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)