[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742466#comment-16742466
 ] 

Steve Loughran commented on HADOOP-16018:
-----------------------------------------

If it didn't happen before the patch, and it happens now, I'm not sure that it 
can be tagged as unrelated. But it could just be this is the first patch to go 
near tools/distcp for a while

How about you attach a branch-2 patch here which does nothing but some harmess 
diff to a documentation page or something, which will be enough to get it 
tested? That'll help isolate it -if that is the problem then we'll have to see 
what's up with jenkins. If, on the other hand, it only happens with this patch, 
then something is up here (very unlikely considering)

> DistCp won't reassemble chunks when blocks per chunk > 0
> --------------------------------------------------------
>
>                 Key: HADOOP-16018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16018
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 3.2.0, 2.9.2
>            Reporter: Kai Xie
>            Assignee: Kai Xie
>            Priority: Major
>             Fix For: 3.0.4, 3.2.1, 3.1.3
>
>         Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, 
> HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, 
> HADOOP-16018-branch-2-003.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + "<blocksperchunk> blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default, <blocksperchunk> is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to