[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745330#comment-14745330
]
Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:23 PM:
---------------------------------------------------------------
Re: maven migration
Would building each tool into a separate jar file instead of one monolithic jar
be a better approach? Not a hadoop expert, but doesn't that 100M jar get
uploaded to all the nodes each time a job is dispatched? If so, shouldn't we
isolate the tools into separate jars and submit the reduced sized jars to the
hadoop cluster instead? This would significantly drop costs related to network
I/O in cloud environments. Not sure if this is just overkill?
However, if this is a good idea, then relocating the writables, io, and common
classes to something like a nutch-common jar would seem beneficial.
was (Author: ndouba):
Re: maven migration
Would building each tool into a separate jar file instead of one monolithic jar
be a better approach? Not a hadoop expert, but doesn't that 100M jar get
uploaded to all the nodes each time a job is dispatched? If so, shouldn't we
isolate the tools into separate jars and submit the reduced sized jars to the
hadoop cluster instead? This would significantly drop costs related to network
I/O in cloud environments. Not sure if this is just overkill?
However, if this is a good idea, then relocating the writables and io classes
to something like a nutch-io jar would seem beneficial.
> Proposal for Nutch 3.x
> ----------------------
>
> Key: NUTCH-2097
> URL: https://issues.apache.org/jira/browse/NUTCH-2097
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.12
> Reporter: Nadeem Douba
> Assignee: Lewis John McGibbney
>
> This is a parent issue which contains a proposal for Nutch 3.x. It's based on
> my branch (mr2-mvn at https://github.com/allfro/nutch).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)