[GitHub] [nutch] pmezard opened a new pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode

GitBox Wed, 10 Jun 2020 05:15:03 -0700


pmezard opened a new pull request #534:
URL: https://github.com/apache/nutch/pull/534



   Before the change, the output file name was hard-coded to "nutch.csv".
   When running in distributed mode, multiple reducers would clobber each
   other output.
   
   After the change, the filename is taken from the first open(cfg, name)
   initialization call, where name is a unique file name generated by
   IndexerOutputFormat, derived from hadoop FileOutputFormat. The CSV files
   are now named like part-r-000xx.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nutch] pmezard opened a new pull request #534: NUTCH-2793 indexer-csv: make it work in distributed mode

Reply via email to