[
https://issues.apache.org/jira/browse/HDFS-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946138#comment-15946138
]
Andrew Wang commented on HDFS-10971:
------------------------------------
This looks great. A couple nitty comments:
{noformat}
170 // If the target directory is Erasure Coded, target FileSystem
takes care
171 // of creating the erasure coded file where the file replication
factor
172 // is not applicable and the passed in replication factor is
ignored.
{noformat}
Don't need to capitalize Erasure Coded. Also to be a little more accurate, I'd
say something like:
If there is an erasure coding policy set on the target directory, files will be
written to the target directory using this EC policy. The replication factor of
the source file is ignored and not preserved.
{noformat}
239 // Preserve the replication attribute only when both source and
240 // destination files are not Erasure Coded.
{noformat}
I'd expand this comment a bit to cover all the conditions in the if statement.
You could also restate the current as "The replication factor can only be
preserved for replicated files. It is ignored when either the source or target
file are erasure coded."
Finally, in the new unit test:
* It'd be good to add messages to the asserts as a form of documentation.
* The test is also named "testPreserve..." whereas we might want to name it
"testReplFactorNotPreserved..." or "...Ignored..." for clarity
* Consider doing a static import on the asserts to make them a little more
concise
* We test EC src and repl dest, should we also test repl src and EC dst? Also
add EC to EC test with different EC policies for completeness?
* We added StripedFileTestUtil#getDefaultECPolicy, maybe we also add a helper
function like DFSTestUtil#enableAllECPolicies but instead
#enableDefaultECPolicy? Perhaps I should have originally put
#enableAllECPolicies in StripedFileTestUtil also. Could also be a cleanup JIRA
for later.
> Distcp should not copy replication factor if source file is erasure coded
> -------------------------------------------------------------------------
>
> Key: HDFS-10971
> URL: https://issues.apache.org/jira/browse/HDFS-10971
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: distcp
> Affects Versions: 3.0.0-alpha1
> Reporter: Wei-Chiu Chuang
> Assignee: Manoj Govindassamy
> Priority: Blocker
> Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-10971.01.patch, HDFS-10971.testcase.patch
>
>
> The current erasure coding implementation uses replication factor field to
> store erasure coding policy.
> Distcp copies the source file's replication factor to the destination if
> {{-pr}} is specified. However, if the source file is EC, the replication
> factor (which is EC policy) should not be replicated to the destination file.
> When a HdfsFileStatus is converted to FileStatus, the replication factor is
> set to 0 if it's an EC file.
> In fact, I will attach a test case that shows trying to replicate the
> replication factor of an EC file results in an IOException: "Requested
> replication factor of 0 is less than the required minimum of 1 for
> /tmp/dst/dest2"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]