[
https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy updated NUTCH-1097:
-------------------------
Attachment: NUTCH-1097-v4.patch
NUTCH-1097-nutchgora_v2.patch
Thanks for looking into this, too.
The following patches (one for nutchgora and one for trunk/1.x) apply your
suggestion. By the way, the nutchgora_v3 patch did not have the proper change
for the plugin.xml, it was accidently excluded. This is fixed now.
Also the change is properly documented in ParserFactory so that anyone scanning
the code will notice why it is done this particular way.
> application/xhtml+xml should be enabled in plugin.xml of parse-html; allow
> multiple mimetypes for plugin.xml
> ------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-1097
> URL: https://issues.apache.org/jira/browse/NUTCH-1097
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Reporter: Ferdy
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1097-nutchgora_v1.patch,
> NUTCH-1097-nutchgora_v2.patch, NUTCH-1097-v1.patch, NUTCH-1097-v2.patch,
> NUTCH-1097-v3.patch, NUTCH-1097-v4.patch
>
>
> The configuration in parse-plugins.xml expects the parse-html plugin to
> accept application/xhtml+xml, however the plugin.xml of this plugin does not
> list this type. Either change the entry in parse-plugins.xml or change the
> parse-html plugin.xml. I suggest the latter. See patch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira