[
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974207#comment-14974207
]
Aron Ahmadia commented on NUTCH-2149:
-------------------------------------
Hi Sujen and Chris.
I think I'm missing something in this API. Looking through, most of the
services seem to require a "full path" to the file that should be read. This
seems like both a security risk (allowing the user to read arbitrary files on
the file system) and an unnecessary load on the API. Does Nutch have some
concept of a "root" directory for each configuration?
Also, it's not clear how to get a listing of the link, node, and sequence
files. Is this available somewhere else in the REST API? How would I (as a
REST interface user) know what path to provide?
> REST endpoint to read Nutch sequence files
> ------------------------------------------
>
> Key: NUTCH-2149
> URL: https://issues.apache.org/jira/browse/NUTCH-2149
> Project: Nutch
> Issue Type: New Feature
> Components: REST_api
> Reporter: Sujen Shah
> Assignee: Sujen Shah
> Labels: memex
> Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any
> other sequence file in the Nutch ecosystem via a RESTful interface.
> The current API documentation for this Reader endpoint is available at -
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)