[ 
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744015#comment-15744015
 ] 

Rakesh R commented on HDFS-11193:
---------------------------------

Thanks [~umamaheswararao] for the useful review comments. Following are the 
changes in the new patch, kindly take another look at the latest patch.

* I've fixed 1,2 comments.
* While testing I found an issue in {{when there is no target node with the 
required storage type}} logic. For example, I have a block with locations 
A(disk), B(disk), C(disk) and assume only A, B and C are live nodes with A & C 
have archive storage type. Again assume, user changed the storage policy to 
{{COLD}}. Now, SPS internally starts preparing the src-target pairing like, 
{{src=> (A, B, C) and target=> (A, C)}}. Its skipping B as it doesn't have 
archive media and this is an indication that SPS should do retries for 
satisfying all of its block locations. On the other side, coordinator will pair 
the src-target node for actual physical movement like, {{movetask=> (A, A), (B, 
C)}}. Here ideally it should do (C, C) instead of (B, C) but mistakenly 
choosing the source C. I think, the implicit assumptions of retry needed will 
create confusions and coding mistakes like this. In this patch, I've created a 
new flag {{retryNeeded}} flag to make it more readable. Now, SPS will prepare 
only the matching pair and dummy source slots will be avoided like, {{src=> (A, 
C) and target=> (A, C)}} and set retryNeeded=true to convey the message that 
this trackId has only partial blocks movements.
* Added one more test for ec striped block.

bq. One another idea in my mind is that, how about just including blockIndexes 
in the case of Striped?
Thanks for this idea. Following is my analysis on this approach. As we know, 
presently NN is passing simple {{Block}} objects to the coordinator datanode 
for movement. Inorder to do the internal block constrcution at the DN side, it 
requires the BlockInfoStriped complex object and the blockIndices array. I 
think passing list of simple object is better compare to the complex object, 
this will keep all the computation complexities at the SPS side and makes the 
coordinator logic more readable. I'd prefer to keep the internal block 
constrcution logic at the NN side. Does this make sense to you?
{code}
+            // construct internal block
+            long blockId = blockInfo.getBlockId() + si.getBlockIndex();
+            long numBytes = StripedBlockUtil.getInternalBlockLength(
+                sBlockInfo.getNumBytes(), sBlockInfo.getCellSize(),
+                sBlockInfo.getDataBlockNum(), si.getBlockIndex());
+            Block blk = new Block(blockId, numBytes,
+                blockInfo.getGenerationStamp());
{code}

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-11193
>                 URL: https://issues.apache.org/jira/browse/HDFS-11193
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-11193-HDFS-10285-00.patch, 
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. 
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider 
> all immediate files under that directory and need to check that, the files 
> really matching with namespace storage policy. All the mismatched striped 
> blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to