[
https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sujen Shah updated NUTCH-2152:
------------------------------
Attachment: NUTCH-2152.git.patch
Here is the first iteration of the patch.
The commoncrawl dump via the service endpoint can be called in the following
manner:
1. POST /services/commoncrawldump
Request data - application/json
{
"confId":"default",
"crawlId":"crawl01",
"args":{"mimetypes":["text/html", "", .....], other params}
}
Response contains the path of the created resource (type:text/plain)
2. To get all the dump paths for a particular crawlId, you can call
GET /services/commoncrawldump/{crawlId}
Response: application/json
{
"dumpPaths":[......list of paths.....]
}
> CommonCrawl dump via Service endpoint
> -------------------------------------
>
> Key: NUTCH-2152
> URL: https://issues.apache.org/jira/browse/NUTCH-2152
> Project: Nutch
> Issue Type: Sub-task
> Components: REST_api
> Affects Versions: 1.12
> Reporter: Sujen Shah
> Assignee: Sujen Shah
> Labels: memex
> Fix For: 1.12
>
> Attachments: NUTCH-2152.git.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)