[jira] [Commented] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script

Hudson (JIRA) Thu, 25 Jun 2015 07:51:30 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601291#comment-14601291
 ]


Hudson commented on NUTCH-2036:
-------------------------------

SUCCESS: Integrated in Nutch-trunk #3174 (See 
[https://builds.apache.org/job/Nutch-trunk/3174/])
Adding some continuous crawl goodies to the crawl script NUTCH-2036 (jnioche: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1687522)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/bin/crawl


> Adding some continuous crawl goodies to the crawl script
> --------------------------------------------------------
>
>                 Key: NUTCH-2036
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2036
>             Project: Nutch
>          Issue Type: Improvement
>          Components: bin, tool, util
>    Affects Versions: 1.10
>            Reporter: Jorge Luis Betancourt Gonzalez
>            Priority: Minor
>              Labels: crawl, script
>             Fix For: 1.11
>
>         Attachments: NUTCH-2036-v2.patch, NUTCH-2036.patch
>
>
> Although Nutch does not support continuous crawling out of the box, and yes 
> this is somehow doable using cron or even sometimes irrelevant due the size 
> of the crawl its a nice feature to have. 
> This patch basically just adds a new parameter option to the {{bin/crawl}} 
> script (-w|--wait) which adds a time to wait if the generator returns 0 (when 
> no URLs are scheduled for fetching). 
> This new parameter has the {{NUMBER\[SUFFIX\]}} format, if no suffix is 
> provided the amount of time is assumed to be in seconds. Other valid suffixes 
> are: 
> s - second
> m - minutes
> h - hours
> d - days
> If a {{-1}} value is passed to the parameter or its not used at all the 
> default behaviour of exciting the script is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2036) Adding some continuous crawl goodies to the crawl script

Reply via email to