[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
------------------------------------
Attachment: NUTCH-932.patch
This patch adds bulk retrieval of crawl results. This is still very rough, e.g.
there's no way to select crawlId or limit the fields... but it returns proper
JSON.
This patch also includes other enhancements and bugfixes - with this patch I
was able to perform a complete crawl cycle via REST.
> Bulk REST API to retrieve crawl results as JSON
> -----------------------------------------------
>
> Key: NUTCH-932
> URL: https://issues.apache.org/jira/browse/NUTCH-932
> Project: Nutch
> Issue Type: New Feature
> Components: REST_api
> Affects Versions: 2.0
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Attachments: NUTCH-932.patch
>
>
> It would be useful to be able to retrieve results of a crawl as JSON. There
> are a few things that need to be discussed:
> * how to return bulk results using Restlet (WritableRepresentation subclass?)
> * what should be the format of results?
> I think it would make sense to provide a single record retrieval (by primary
> key), all records, and records within a range. This incidentally matches well
> the capabilities of the Gora Query class :)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.