[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated NUTCH-880: ------------------------------------ Attachment: API.patch Initial patch for discussion. This is a work in progress, so only some functionality is implemented, and even less than that is actually working ;) I would appreciate a review and comments. > REST API (and webapp) for Nutch > ------------------------------- > > Key: NUTCH-880 > URL: https://issues.apache.org/jira/browse/NUTCH-880 > Project: Nutch > Issue Type: New Feature > Affects Versions: 2.0 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Attachments: API.patch > > > This issue is for discussing a REST-style API for accessing Nutch. > Here's an initial idea: > * I propose to use org.restlet for handling requests and returning > JSON/XML/whatever responses. > * hook up all regular tools so that they can be driven via this API. This > would have to be an async API, since all Nutch operations take long time to > execute. It follows then that we need to be able also to list running > operations, retrieve their current status, and possibly > abort/cancel/stop/suspend/resume/...? This also means that we would have to > potentially create & manage many threads in a servlet - AFAIK this is frowned > upon by J2EE purists... > * package this in a webapp (that includes all deps, essentially nutch.job > content), with the restlet servlet as an entry point. > Open issues: > * how to implement the reading of crawl results via this API > * should we manage only crawls that use a single configuration per webapp, or > should we have a notion of crawl contexts (sets of crawl configs) with CRUD > ops on them? this would be nice, because it would allow managing of several > different crawls, with different configs, in a single webapp - but it > complicates the implementation a lot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.