[ 
https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208345#comment-13208345
 ] 

Lewis John McGibbney commented on NUTCH-1129:
---------------------------------------------

Yeah your right Markus. The Any23 libraries are parsers for extracting stuff 
like microdata we would rely upon Tika for content extraction. Currently in 
Any23 I think were stuck way back at 0.6 or something so there is obviously 
work to be done here obviously. I've been looking at 
https://svn.apache.org/viewvc/nutch/trunk/src/plugin/microformats-reltag/
I'll work towards reusing as much of the Tika stuff we have.
                
> Any23 Nutch plugin
> ------------------
>
>                 Key: NUTCH-1129
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1129
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: NUTCH-1129.patch
>
>
> This plugin should build on the Any23 library to provide us with a plugin 
> which extracts RDF data from HTTP and file resources. Although as of writing 
> Any23 not part of the ASF, the project is working towards integration into 
> the Apache Incubator. Once the project proves its value, this would be an 
> excellent addition to the Nutch 1.X codebase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to