RE: Bugs in 0.16.0?

Holden Robbins Sun, 02 Mar 2008 21:59:37 -0800

Thanks to all for the responses.
 
1) I think I may have assumed it would default to the name of the file itself.  
Might be a worthwhile default behavior?
2) Looks like it might be a host based firewall issue.
3) I started working on a patch to support setting # of threads per task, 
testing it now.
I've tried not to affect the way it runs non-threaded, by making synchronized 
wrappers for each of the input, output, and reporter classes which are only 
used when threading is used.  Not sure if it's worth the effort to maintain 
these instead of just making the other classes thread safe?
 
 
________________________________


From: Amareshwari Sri Ramadasu [mailto:[EMAIL PROTECTED]
Sent: Sun 3/2/2008 8:09 PM
To: [email protected]
Subject: Re: Bugs in 0.16.0?



Holden Robbins wrote:
> Hello,
> 
> I'm just starting to dig into Hadoop and testing it's feasibility for large 
> scale development work. 
> I was wondering if anyone else being affected by these issues using hadoop 
> 0.16.0?
> I searched Jira, and I'm not sure if I saw anything that specifically fit 
> some of these:
> 
> 1) The symlinks for the distributed cache in the task directory are being 
> created as 'null' directory links (stated another way, the name of the 
> symbolic link in the directory is the string literal "null").  Am I doing 
> something wrong to cause this, or do not many people use this functionality?
>  
If you want create symlinks for distributed cache the url has to have
symlink field like hdfs://host:port/<absolute-path>#<link>. And
mapred.create.symlink must be set to "yes".
If mapred.create.symlink is yes and link field is not provided,
distributed cache will create a symlink with literal "null" as you said.
> 
> 2) I'm running into an issue where the job is giving errors in the form:
> 08/03/01 09:44:25 INFO mapred.JobClient: Task Id : 
> task_200803010908_0001_r_000002_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
>
> The jobs appear to never finish the reducing once this happens.   The tasks 
> themselves are long running map tasks (up to 10 minutes per input), as far as 
> I understand from the Jira posts this  is related to the 
> MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a known work 
> around or fix in the pipeline?
> 
> Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
> Improving the way the shuffling mechanism works may also help? 
> https://issues.apache.org/jira/browse/HADOOP-1339
> 
> I've tried setting:
> <property>
>   <name>mapred.reduce.copy.backoff</name>
>   <value>1440</value>
>   <description>The maximum amount of time (in seconds) a reducer spends on  
> fetching one map output before declaring it as failed.</description>
> </property>
>  which should be 24 minutes, with no effect.
> 
> 
> 3) Lastly, it would seem beneficial for jobs that have significant startup 
> overhead and memory requirements to not be run in separate JVMs for each 
> task.  Along these lines, it looks like someone submitted a patch for 
> JVM-reuse a while back, but it wasn't commited? 
> https://issues.apache.org/jira/browse/HADOOP-249
> 
> Probably a question for the dev mailing list, but if I wanted to modify 
> hadoop to allow threading tasks, rather than running independent JVMs, is 
> there any reason someone hasn't done this yet?  Or am I overlooking something?
> 
> 
> Thanks,
> -Holden
>
>

RE: Bugs in 0.16.0?

Reply via email to