[
https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-840:
---------------------------------------
Attachment: NUTCH-840.patch
Hi Julien. I have absolutely no idea how or when I ended up working on this,
but I think the attachment nearly addresses this issue. It is from a while back
and to be honest I can't really remeber working on it...
Anyway, I think the parse-tika tests fail as it is not quite working properly
yet. The patch also changes the directory structure to o.a.n.p.tika rather than
existing o.a.n.tika which is inconsistent with other parser plugin
implementation we ship with Nutch.
Sorry for hijacking this one slightly.
> Port tests from parse-html to parse-tika
> ----------------------------------------
>
> Key: NUTCH-840
> URL: https://issues.apache.org/jira/browse/NUTCH-840
> Project: Nutch
> Issue Type: Task
> Components: parser
> Affects Versions: 1.1
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: nutchgora
>
> Attachments: NUTCH-840.patch, NUTCH-840.patch
>
>
> We don't have test for HTML in parse-tika so I'll copy them from the old
> parse-html plugin
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira