[ 
https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974207#comment-14974207
 ] 

Aron Ahmadia commented on NUTCH-2149:
-------------------------------------

Hi Sujen and Chris.

I think I'm missing something in this API.  Looking through, most of the 
services seem to require a "full path" to the file that should be read.  This 
seems like both a security risk (allowing the user to read arbitrary files on 
the file system) and an unnecessary load on the API.  Does Nutch have some 
concept of a "root" directory for each configuration?  

Also, it's not clear how to get a listing of the link, node, and sequence 
files.  Is this available somewhere else in the REST API?  How would I (as a 
REST interface user) know what path to provide?

> REST endpoint to read Nutch sequence files
> ------------------------------------------
>
>                 Key: NUTCH-2149
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2149
>             Project: Nutch
>          Issue Type: New Feature
>          Components: REST_api
>            Reporter: Sujen Shah
>            Assignee: Sujen Shah
>              Labels: memex
>             Fix For: 1.12
>
>
> This endpoint enables reading of the webgraph data like nodes, links and any 
> other sequence file in the Nutch ecosystem via a RESTful interface. 
> The current API documentation for this Reader endpoint is available at - 
> http://docs.nutchpytonutchrestapi.apiary.io/
> Thanks to https://github.com/ContinuumIO/nutchpy for the initial work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to