[ 
https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340742#comment-16340742
 ] 

ASF GitHub Bot commented on NUTCH-2501:
---------------------------------------

sebastian-nagel commented on a change in pull request #279: NUTCH-2501: Take 
NUTCH_HEAPSIZE into account  when crawling using crawl script
URL: https://github.com/apache/nutch/pull/279#discussion_r164055125
 
 

 ##########
 File path: src/bin/crawl
 ##########
 @@ -105,6 +105,10 @@ SIZE_FETCHLIST=50000 # 25K x NUM_TASKS
 TIME_LIMIT_FETCH=180
 NUM_THREADS=50
 SITEMAPS_FROM_HOSTDB_FREQUENCY=never
+NUTCH_HEAP_MB=2000
 
 Review comment:
   I've just seen that NUTCH_HEAPSIZE (and also NUTCH_OPTS) isn't used by 
bin/nutch in distributed mode 
([L326](https://github.com/apache/nutch/blob/e533ab21b18cf81a49e052185562a7e6489ec4d6/src/bin/nutch#L326)).
 If this was/is the problem, I would also fix it in bin/nutch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Take into account $NUTCH_HEAPSIZE when crawling using crawl script
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2501
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2501
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Moreno Feltscher
>            Assignee: Lewis John McGibbney
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to