Thanks Todd.

Unfortunately, I'm using Hadoop cascading, so I'm not sure if there's an easy mechanism to force LocalJobs it fires off to use a different configuration. I'll talk to the Cascading folks and find out.

J


Quoting Todd Lipcon <[email protected]>:

Hi Jeremy,

That's a good point - we don't currently do a good job of segregating the
configurations used for the LJR from the configs used for the TaskTracker.
In particular I think both mapred.local.dir and mapred.system.dir are used
by both.

You run into the same issue when trying to use LJR on a system with a
configured cluster, even if not using the LinuxTaskController features.

I'd recommend making a separate hadoop conf/ directory with a different
setting for mapred.local.dir.

-Todd

On Fri, May 6, 2011 at 11:45 AM, <[email protected]> wrote:

Hi,

I'm running hadoop (Cloudera release 3) in pseudo distributed mode, with
the linux task controller so that jobs will run as the user who submitted
them.

My program (which uses hadoop cascading) fires off a job using
LocalJobRunner (I think to read data from the local filesystem). So far so
good.
The job creates the directory
/var/lib/hadoop-0.20/cache/pseudo/localRunner
(/var/lib/hadoop-0.20/cache/pseudo being the value of mapred.local.dir)

The problem is that localRunner isn't owned by the user mapred. Instead its
owned by the user who submitted the job. The next time I restart the
daemons, the task tracker will fail because it can't rename
/var/lib/hadoop-0.20/cache/pseudo/localRunner.

Does anybody have suggestions how to fix this?

Thanks
Jeremy





--
Todd Lipcon
Software Engineer, Cloudera




Reply via email to