[ 
https://issues.apache.org/jira/browse/HDFS-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946138#comment-15946138
 ] 

Andrew Wang commented on HDFS-10971:
------------------------------------

This looks great. A couple nitty comments:

{noformat}
170           // If the target directory is Erasure Coded, target FileSystem 
takes care
171           // of creating the erasure coded file where the file replication 
factor
172           // is not applicable and the passed in replication factor is 
ignored.
{noformat}

Don't need to capitalize Erasure Coded. Also to be a little more accurate, I'd 
say something like:

If there is an erasure coding policy set on the target directory, files will be 
written to the target directory using this EC policy. The replication factor of 
the source file is ignored and not preserved.

{noformat}
239         // Preserve the replication attribute only when both source and
240         // destination files are not Erasure Coded.
{noformat}

I'd expand this comment a bit to cover all the conditions in the if statement. 
You could also restate the current as "The replication factor can only be 
preserved for replicated files. It is ignored when either the source or target 
file are erasure coded."

Finally, in the new unit test:
* It'd be good to add messages to the asserts as a form of documentation.
* The test is also named "testPreserve..." whereas we might want to name it 
"testReplFactorNotPreserved..." or "...Ignored..." for clarity
* Consider doing a static import on the asserts to make them a little more 
concise
* We test EC src and repl dest, should we also test repl src and EC dst? Also 
add EC to EC test with different EC policies for completeness?
* We added StripedFileTestUtil#getDefaultECPolicy, maybe we also add a helper 
function like DFSTestUtil#enableAllECPolicies but instead 
#enableDefaultECPolicy? Perhaps I should have originally put 
#enableAllECPolicies in StripedFileTestUtil also. Could also be a cleanup JIRA 
for later.

> Distcp should not copy replication factor if source file is erasure coded
> -------------------------------------------------------------------------
>
>                 Key: HDFS-10971
>                 URL: https://issues.apache.org/jira/browse/HDFS-10971
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: distcp
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-10971.01.patch, HDFS-10971.testcase.patch
>
>
> The current erasure coding implementation uses replication factor field to 
> store erasure coding policy.
> Distcp copies the source file's replication factor to the destination if 
> {{-pr}} is specified. However, if the source file is EC, the replication 
> factor (which is EC policy) should not be replicated to the destination file. 
> When a HdfsFileStatus is converted to FileStatus, the replication factor is 
> set to 0 if it's an EC file.
> In fact, I will attach a test case that shows trying to replicate the 
> replication factor of an EC file results in an IOException: "Requested 
> replication factor of 0 is less than the required minimum of 1 for 
> /tmp/dst/dest2"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to