[jira] Commented: (NUTCH-880) REST API for Nutch

Alexis (JIRA) Fri, 05 Nov 2010 17:27:06 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928896#action_12928896
 ]


Alexis commented on NUTCH-880:
------------------------------

This revision introduced a bug in the nutch inject command. It now throws a 
NullPointerException.

Please take a look at:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/InjectorJob.java?annotate=1028235&pathrev=1028235

Make sure the first element in the array is not null:

{noformat}
Index: src/java/org/apache/nutch/crawl/InjectorJob.java
===================================================================
--- src/java/org/apache/nutch/crawl/InjectorJob.java    (revision 1031881)
+++ src/java/org/apache/nutch/crawl/InjectorJob.java    (working copy)
@@ -242,6 +242,7 @@
     job.setReducerClass(Reducer.class);
     job.setNumReduceTasks(0);
     job.waitForCompletion(true);
+    jobs[0] = job;

     job = new NutchJob(getConf(), "inject-p2 " + args[0]);
     StorageUtils.initMapperJob(job, FIELDS, String.class,
{noformat}


> REST API for Nutch
> ------------------
>
>                 Key: NUTCH-880
>                 URL: https://issues.apache.org/jira/browse/NUTCH-880
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: API-2.patch, API.patch
>
>
> This issue is for discussing a REST-style API for accessing Nutch.
> Here's an initial idea:
> * I propose to use org.restlet for handling requests and returning 
> JSON/XML/whatever responses.
> * hook up all regular tools so that they can be driven via this API. This 
> would have to be an async API, since all Nutch operations take long time to 
> execute. It follows then that we need to be able also to list running 
> operations, retrieve their current status, and possibly 
> abort/cancel/stop/suspend/resume/...? This also means that we would have to 
> potentially create & manage many threads in a servlet - AFAIK this is frowned 
> upon by J2EE purists...
> * package this in a webapp (that includes all deps, essentially nutch.job 
> content), with the restlet servlet as an entry point.
> Open issues:
> * how to implement the reading of crawl results via this API
> * should we manage only crawls that use a single configuration per webapp, or 
> should we have a notion of crawl contexts (sets of crawl configs) with CRUD 
> ops on them? this would be nice, because it would allow managing of several 
> different crawls, with different configs, in a single webapp - but it 
> complicates the implementation a lot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-880) REST API for Nutch

Reply via email to