Re: Too large class path for map reduce jobs

Tom White Tue, 05 Oct 2010 15:59:49 -0700

Hi Henning,

I don't know if you've seen
https://issues.apache.org/jira/browse/MAPREDUCE-1938 and
https://issues.apache.org/jira/browse/MAPREDUCE-1700 which have
discussion about this issue.


Cheers
Tom

On Fri, Sep 24, 2010 at 3:41 AM, Henning Blohm <[email protected]> wrote:
> Short update on the issue:
>
> I tried to find a way to separate class path configurations by modifying the
> scripts in HADOOP_HOME/bin but found that TaskRunner actually copies the
> class path setting from the parent process when starting a local task so
> that I do not see a way of having less on a job's classpath without
> modifying Hadoop.
>
> As that will present a real issue when running our jobs on Hadoop I would
> like to propose to change TaskRunner so that it sets a class path
> specifically for M/R tasks. That class path could be defined in the scipts
> (as for the other processes) using a particular environment variable (e.g.
> HADOOP_JOB_CLASSPATH). It could default to the current VM's class path,
> preserving today's behavior.
>
> Is it ok to enter this as an issue?
>
> Thanks,
>   Henning
>
>
> Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer:
>
> On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
>
>> When running map reduce tasks in Hadoop I run into classpath issues.
>> Contrary to previous posts, my problem is not that I am missing classes on
>> the Task's class path (we have a perfect solution for that) but rather find
>> too many (e.g. ECJ classes or jetty).
>
> The fact that you mention:
>
>> The libs in HADOOP_HOME/lib seem to contain everything needed to run
>> anything in Hadoop which is, I assume, much more than is needed to run a map
>> reduce task.
>
> hints that your perfect solution is to throw all your custom stuff in lib.
> If so, that's a huge mistake.  Use distributed cache instead.
>

Re: Too large class path for map reduce jobs

Reply via email to