[jira] Commented: (HADOOP-6072) distcp should place the file distcp_src_files in distributed cache

Doug Cutting (JIRA) Fri, 19 Jun 2009 10:36:31 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721896#action_12721896
 ]


Doug Cutting commented on HADOOP-6072:
--------------------------------------

Some comments:
 - 'sleeptime' should be 'getSleeptime()' to be thread safe, no?  or maybe use 
int as a sleep time, since updates to an int are atomic.
 - getNumRunningMaps() is expensive to call from each node at each interval, 
since reports for all tasks must be retrieved from the JT.  better would be to 
just fetch the job's counters each time, since they're constant-sized, not 
proportional to the number of tasks.  You'd need to add a maps_completed 
counter, then use the difference between that and TOTAL_LAUNCHED_MAPS to 
calculate the number running.
 - the interval to contact the JT might be randomized a bit, so that not all 
tasks hit it at the same time, e.g., by adding a random value that's 10% of the 
specified value.
 - when InterruptedException is caught a thread should generally exit, not 
simply log a warning.  if things will no longer work correctly without the 
thread, then it should somehow cause other threads dependent threads to fail 
too.
 - getNumRunningMaps() should either return a correct value or throw an 
exception.  if it cannot contact the JT or if the task does not know its Id it 
should fail, no?

> distcp should place the file distcp_src_files in distributed cache
> ------------------------------------------------------------------
>
>                 Key: HADOOP-6072
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6072
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 0.21.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.21.0
>
>         Attachments: d_replica_srcfilelist.patch
>
>
> When large number of files are being copied by distcp, accessing 
> distcp_src_files seems to be an issue, as all map tasks would be accessing 
> this file. The error message seen is:
> 09/06/16 10:13:16 INFO mapred.JobClient: Task Id : 
> attempt_200906040559_0110_m_003348_0, Status : FAILED
> java.io.IOException: Could not obtain block: blk_-4229860619941366534_1500174
> file=/mapredsystem/hadoop/mapredsystem/distcp_7fiyvq/_distcp_src_files
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1757)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1585)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1712)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>         at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>         at 
> org.apache.hadoop.tools.DistCp$CopyInputFormat.getRecordReader(DistCp.java:299)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> This could be because of HADOOP-6038 and/or HADOOP-4681.
> If distcp places this special file distcp_src_files in distributed cache, 
> that could solve the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6072) distcp should place the file distcp_src_files in distributed cache

Reply via email to