Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "CommonCrawlDataDumper" page has been changed by darrencheng:
https://wiki.apache.org/nutch/CommonCrawlDataDumper?action=diff&rev1=2&rev2=3

  bin/nutch commoncrawldump -outputDir outCommonCrawl -segment 
testCrawl/segments
  }}}
  
+ If when you start running the script later you start getting an error called 
{{{OutOfMemoryError}}}, try changing the JAVA_HEAP_MAX variable in line 128 of 
{{{bin/nutch}}} to an appropriate value. 
+ 
  The {{{bin/nutch commoncrawldump}}} program dumps out all Nutch segments 
included in {{{testCrawl/segments}}} to {{{outCommonCrawl}}} folder, making one 
CBOR-encoded file for each crawled file. The tool will show a short report as 
follows:
  
  {{{

Reply via email to