[ 
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745330#comment-14745330
 ] 

Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:23 PM:
---------------------------------------------------------------

Re: maven migration

Would building each tool into a separate jar file instead of one monolithic jar 
be a better approach? Not a hadoop expert, but doesn't that 100M jar get 
uploaded to all the nodes each time a job is dispatched? If so, shouldn't we 
isolate the tools into separate jars and submit the reduced sized jars to the 
hadoop cluster instead? This would significantly drop costs related to network 
I/O in cloud environments. Not sure if this is just overkill?

However, if this is a good idea, then relocating the writables, io, and common 
classes to something like a nutch-common jar would seem beneficial.


was (Author: ndouba):
Re: maven migration

Would building each tool into a separate jar file instead of one monolithic jar 
be a better approach? Not a hadoop expert, but doesn't that 100M jar get 
uploaded to all the nodes each time a job is dispatched? If so, shouldn't we 
isolate the tools into separate jars and submit the reduced sized jars to the 
hadoop cluster instead? This would significantly drop costs related to network 
I/O in cloud environments. Not sure if this is just overkill?

However, if this is a good idea, then relocating the writables and io classes 
to something like a nutch-io jar would seem beneficial.

> Proposal for Nutch 3.x
> ----------------------
>
>                 Key: NUTCH-2097
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2097
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.12
>            Reporter: Nadeem Douba
>            Assignee: Lewis John McGibbney
>
> This is a parent issue which contains a proposal for Nutch 3.x. It's based on 
> my branch (mr2-mvn at https://github.com/allfro/nutch).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to