Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Nutch_1.X_RESTAPI" page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI?action=diff&rev1=5&rev2=6

        "result":null,
        "state":"FINISHED",
        "msg":"",
-       "crawlId":"crawl-01"
+       "crawlId":"crawl01"
     }
  }}}}
  
@@ -204, +204 @@

  {{{{
  POST /job/create
     {
-       "crawlId":"crawl-01",
+       "crawlId":"crawl01",
        "type":"FETCH",
        "confId":"default",
        "args":{"someParam":"someValue"}
@@ -212, +212 @@

  
  POST /job/create
     {
-       "crawlId":"crawl-01",
+       "crawlId":"crawl01",
        "jobClassName":"org.apache.nutch.fetcher.FetcherJob"
        "confId":"default",
        "args":{"someParam":"someValue"}
@@ -224, +224 @@

      job-id-43243
  }}}}
  
- === URL ===
+ === Databse ===
  
- This point is created in order to get the required information about a URL or 
list of URLs to generate a D3 visualization. The information obtained from this 
API point will help 
+ This point provides access to information stored in the CrawlDb.  
  {{{{
- GET /url/{filtered-url}
+ POST /db/crawldb with following
+ {     "type":"stats",
+       "confId":"default",
+       "crawlId":"crawl01",
+       "args":{"someParam":"someValue"}
+ }
  }}}}
- __Response__ contains information about the url from the CrawlDbReader.java 
class. The parameters are
+ The different values for the type parameter are - dump, topN and url. Their 
corresponding arguments can be found 
[[https://wiki.apache.org/nutch/bin/nutch%20readdb|here]].
+ 
+ __Response__ contains information from the CrawlDbReader.java class. For the 
above mentioned request, the JSON response would like like-  
  {{{{
-    {
+   {
+       "retry 0":"8350",
-       "url" : "",
-       "statusCode" : "",
-       "fetchTime" : "",
-       "score" : "",
+       "minScore":"0.0",
-       "numOfInlinks" : "",
-       "numOfOutlinks" : "",
+       "retry 1":"96",
+       "status":{ 
+                 "3":{"count":"21","statusValue":"db_gone"},
+                 "2":{"count":"594","statusValue":"db_fetched"},
+                 "1":{"count":"7721","statusValue":"db_unfetched"},
+                 "5":{"count":"86","statusValue":"db_redir_perm"},
+                 "4":{"count":"24","statusValue":"db_redir_temp"}
+                 },
+       "totalUrls":"8446",
+       "maxScore":"0.528",
+       "avgScore":"0.029593771"
-    }
+   }
  }}}}
+ 
+ '''Note: ''' If any other type than stats (like dump, topN, url) is used then 
the response will be a file (application-octet-stream).
  
  == More ==
  Description of more API points coming soon.

Reply via email to