[
https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154107#comment-16154107
]
ASF GitHub Bot commented on NUTCH-1129:
---------------------------------------
thilohaas commented on issue #205: WIP: NUTCH-1129 microdata for Nutch 1.x
URL: https://github.com/apache/nutch/pull/205#issuecomment-327262863
Sadly I'm currently too busy, but will definitely look into it as soon as
possible.
Do you maybe have an idea of how to pass an array or hash of strings to the
filter (see my comment on the PR)? So I would be able to simplify the process
and come up with an alternative way of storing triples on the documents.
btw the any23 webservice seems to be broken, as it's failing on all websites
I've tried. For example google as well:
http://any23.org/any23/?format=best&uri=https%3A%2F%2Fgoogle.com&validation-mode=none
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Any23 Nutch plugin
> ------------------
>
> Key: NUTCH-1129
> URL: https://issues.apache.org/jira/browse/NUTCH-1129
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Minor
> Fix For: 2.5
>
> Attachments: NUTCH-1129.patch
>
>
> This plugin should build on the Any23 library to provide us with a plugin
> which extracts RDF data from HTTP and file resources. Although as of writing
> Any23 not part of the ASF, the project is working towards integration into
> the Apache Incubator. Once the project proves its value, this would be an
> excellent addition to the Nutch 1.X codebase.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)