[ 
https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074338#comment-16074338
 ] 

Uma Maheswara Rao G commented on HDFS-11965:
--------------------------------------------

Hi [~surendrasingh]

Sorry for coming late. Thank you for all good work. Thanks Rakesh for nice 
reviews.
However, I had the following feedback on the patch. I also got chance to have a 
chat with Rakesh about alternative proposal below.
 
Comments:
# blockStorageMovementNeeded.increaseRetryCount(trackId); : I feel there could 
be some race condition when blocks has enough replicas in retry time. Lets 
assume, you found low redundancy and incremented retry count. When retrying the 
same element again, if block reaches enough replicas, you may not get chance to 
remove that retry count. that may leak in that case?
# isAllBlockSatisfyPolicy: I feel this method is having lot of duplicate code. 
this method no longer needed if we agree on below proposal.
# .
{code}
} else {
+                    blockStorageMovementNeeded.removeRetry(trackId);
+                  }
{code}
log missing.
# Nice test cases 

*To simplify this, here is an alternative approach:*
How about you add another state  case FEW_LOW_REDUNDENCY_BLOCKS: ? If the 
status is FEW_LOW_REDUNDENCY_BLOCKS, you could call 
this.storageMovementsMonitor.add(blockCollectionID, false);
So, this will not remove xattrs while processing the result. 
Also you can change parameter name allBlockLocsAttemptedToSatisfy in ‘public 
void add(Long blockCollectionID,
      boolean allBlockLocsAttemptedToSatisfy)’ to noRetry. Right now this is 
boolean, if you want to make it more readable, we can make this as small object 
with reason and retry count.
Example: public void add(Long blockCollectionID, RetryInfo retryInfo)
RetryInfo contains: 1. reason for retry 2. retriedCount
Seems like you are trying to make definite retry for underReplicated. May be 
you could file another JIRA for handling definite retry for all Items 
generically. Today, if file is not able to satisfy for some reasons like not 
enough storages etc, we may keep retry. So, that new JIRA can investigate how 
to make retrials configurable. 

> [SPS]: Should give chance to satisfy the low redundant blocks before removing 
> the xattr
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-11965
>                 URL: https://issues.apache.org/jira/browse/HDFS-11965
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-10285
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-11965-HDFS-10285.001.patch, 
> HDFS-11965-HDFS-10285.002.patch, HDFS-11965-HDFS-10285.003.patch, 
> HDFS-11965-HDFS-10285.004.patch, HDFS-11965-HDFS-10285.005.patch
>
>
> The test case is failing because all the required replicas are not moved in 
> expected storage. This is happened because of delay in datanode registration 
> after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and  got registered with 
> NameNode. (one datanode  not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move 
> one replica in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this 
> condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
>                 && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
>               blockStorageMovementNeeded
>                   .add(storageMovementAttemptedResult.getTrackId());
>             ....................
>             ......................
>             } else {
>             ....................
>             ......................
>               this.sps.postBlkStorageMovementCleanup(
>                   storageMovementAttemptedResult.getTrackId());
>             }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK 
> replica. Now Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, 
> timeout, fs);{code}
> This condition never became true and test case will be timed out.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to