Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Nutch_1.X_RESTAPI" page has been changed by SujenShah: https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI?action=diff&rev1=5&rev2=6 "result":null, "state":"FINISHED", "msg":"", - "crawlId":"crawl-01" + "crawlId":"crawl01" } }}}} @@ -204, +204 @@ {{{{ POST /job/create { - "crawlId":"crawl-01", + "crawlId":"crawl01", "type":"FETCH", "confId":"default", "args":{"someParam":"someValue"} @@ -212, +212 @@ POST /job/create { - "crawlId":"crawl-01", + "crawlId":"crawl01", "jobClassName":"org.apache.nutch.fetcher.FetcherJob" "confId":"default", "args":{"someParam":"someValue"} @@ -224, +224 @@ job-id-43243 }}}} - === URL === + === Databse === - This point is created in order to get the required information about a URL or list of URLs to generate a D3 visualization. The information obtained from this API point will help + This point provides access to information stored in the CrawlDb. {{{{ - GET /url/{filtered-url} + POST /db/crawldb with following + { "type":"stats", + "confId":"default", + "crawlId":"crawl01", + "args":{"someParam":"someValue"} + } }}}} - __Response__ contains information about the url from the CrawlDbReader.java class. The parameters are + The different values for the type parameter are - dump, topN and url. Their corresponding arguments can be found [[https://wiki.apache.org/nutch/bin/nutch%20readdb|here]]. + + __Response__ contains information from the CrawlDbReader.java class. For the above mentioned request, the JSON response would like like- {{{{ - { + { + "retry 0":"8350", - "url" : "", - "statusCode" : "", - "fetchTime" : "", - "score" : "", + "minScore":"0.0", - "numOfInlinks" : "", - "numOfOutlinks" : "", + "retry 1":"96", + "status":{ + "3":{"count":"21","statusValue":"db_gone"}, + "2":{"count":"594","statusValue":"db_fetched"}, + "1":{"count":"7721","statusValue":"db_unfetched"}, + "5":{"count":"86","statusValue":"db_redir_perm"}, + "4":{"count":"24","statusValue":"db_redir_temp"} + }, + "totalUrls":"8446", + "maxScore":"0.528", + "avgScore":"0.029593771" - } + } }}}} + + '''Note: ''' If any other type than stats (like dump, topN, url) is used then the response will be a file (application-octet-stream). == More == Description of more API points coming soon.

