[
https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074338#comment-16074338
]
Uma Maheswara Rao G commented on HDFS-11965:
--------------------------------------------
Hi [~surendrasingh]
Sorry for coming late. Thank you for all good work. Thanks Rakesh for nice
reviews.
However, I had the following feedback on the patch. I also got chance to have a
chat with Rakesh about alternative proposal below.
Comments:
# blockStorageMovementNeeded.increaseRetryCount(trackId); : I feel there could
be some race condition when blocks has enough replicas in retry time. Lets
assume, you found low redundancy and incremented retry count. When retrying the
same element again, if block reaches enough replicas, you may not get chance to
remove that retry count. that may leak in that case?
# isAllBlockSatisfyPolicy: I feel this method is having lot of duplicate code.
this method no longer needed if we agree on below proposal.
# .
{code}
} else {
+ blockStorageMovementNeeded.removeRetry(trackId);
+ }
{code}
log missing.
# Nice test cases
*To simplify this, here is an alternative approach:*
How about you add another state case FEW_LOW_REDUNDENCY_BLOCKS: ? If the
status is FEW_LOW_REDUNDENCY_BLOCKS, you could call
this.storageMovementsMonitor.add(blockCollectionID, false);
So, this will not remove xattrs while processing the result.
Also you can change parameter name allBlockLocsAttemptedToSatisfy in ‘public
void add(Long blockCollectionID,
boolean allBlockLocsAttemptedToSatisfy)’ to noRetry. Right now this is
boolean, if you want to make it more readable, we can make this as small object
with reason and retry count.
Example: public void add(Long blockCollectionID, RetryInfo retryInfo)
RetryInfo contains: 1. reason for retry 2. retriedCount
Seems like you are trying to make definite retry for underReplicated. May be
you could file another JIRA for handling definite retry for all Items
generically. Today, if file is not able to satisfy for some reasons like not
enough storages etc, we may keep retry. So, that new JIRA can investigate how
to make retrials configurable.
> [SPS]: Should give chance to satisfy the low redundant blocks before removing
> the xattr
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-11965
> URL: https://issues.apache.org/jira/browse/HDFS-11965
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: HDFS-10285
> Reporter: Surendra Singh Lilhore
> Assignee: Surendra Singh Lilhore
> Attachments: HDFS-11965-HDFS-10285.001.patch,
> HDFS-11965-HDFS-10285.002.patch, HDFS-11965-HDFS-10285.003.patch,
> HDFS-11965-HDFS-10285.004.patch, HDFS-11965-HDFS-10285.005.patch
>
>
> The test case is failing because all the required replicas are not moved in
> expected storage. This is happened because of delay in datanode registration
> after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and got registered with
> NameNode. (one datanode not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move
> one replica in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this
> condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
> && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
> blockStorageMovementNeeded
> .add(storageMovementAttemptedResult.getTrackId());
> ....................
> ......................
> } else {
> ....................
> ......................
> this.sps.postBlkStorageMovementCleanup(
> storageMovementAttemptedResult.getTrackId());
> }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK
> replica. Now Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1,
> timeout, fs);{code}
> This condition never became true and test case will be timed out.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]