[
https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208345#comment-13208345
]
Lewis John McGibbney commented on NUTCH-1129:
---------------------------------------------
Yeah your right Markus. The Any23 libraries are parsers for extracting stuff
like microdata we would rely upon Tika for content extraction. Currently in
Any23 I think were stuck way back at 0.6 or something so there is obviously
work to be done here obviously. I've been looking at
https://svn.apache.org/viewvc/nutch/trunk/src/plugin/microformats-reltag/
I'll work towards reusing as much of the Tika stuff we have.
> Any23 Nutch plugin
> ------------------
>
> Key: NUTCH-1129
> URL: https://issues.apache.org/jira/browse/NUTCH-1129
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.5
>
> Attachments: NUTCH-1129.patch
>
>
> This plugin should build on the Any23 library to provide us with a plugin
> which extracts RDF data from HTTP and file resources. Although as of writing
> Any23 not part of the ASF, the project is working towards integration into
> the Apache Incubator. Once the project proves its value, this would be an
> excellent addition to the Nutch 1.X codebase.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira