[
https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744015#comment-15744015
]
Rakesh R commented on HDFS-11193:
---------------------------------
Thanks [~umamaheswararao] for the useful review comments. Following are the
changes in the new patch, kindly take another look at the latest patch.
* I've fixed 1,2 comments.
* While testing I found an issue in {{when there is no target node with the
required storage type}} logic. For example, I have a block with locations
A(disk), B(disk), C(disk) and assume only A, B and C are live nodes with A & C
have archive storage type. Again assume, user changed the storage policy to
{{COLD}}. Now, SPS internally starts preparing the src-target pairing like,
{{src=> (A, B, C) and target=> (A, C)}}. Its skipping B as it doesn't have
archive media and this is an indication that SPS should do retries for
satisfying all of its block locations. On the other side, coordinator will pair
the src-target node for actual physical movement like, {{movetask=> (A, A), (B,
C)}}. Here ideally it should do (C, C) instead of (B, C) but mistakenly
choosing the source C. I think, the implicit assumptions of retry needed will
create confusions and coding mistakes like this. In this patch, I've created a
new flag {{retryNeeded}} flag to make it more readable. Now, SPS will prepare
only the matching pair and dummy source slots will be avoided like, {{src=> (A,
C) and target=> (A, C)}} and set retryNeeded=true to convey the message that
this trackId has only partial blocks movements.
* Added one more test for ec striped block.
bq. One another idea in my mind is that, how about just including blockIndexes
in the case of Striped?
Thanks for this idea. Following is my analysis on this approach. As we know,
presently NN is passing simple {{Block}} objects to the coordinator datanode
for movement. Inorder to do the internal block constrcution at the DN side, it
requires the BlockInfoStriped complex object and the blockIndices array. I
think passing list of simple object is better compare to the complex object,
this will keep all the computation complexities at the SPS side and makes the
coordinator logic more readable. I'd prefer to keep the internal block
constrcution logic at the NN side. Does this make sense to you?
{code}
+ // construct internal block
+ long blockId = blockInfo.getBlockId() + si.getBlockIndex();
+ long numBytes = StripedBlockUtil.getInternalBlockLength(
+ sBlockInfo.getNumBytes(), sBlockInfo.getCellSize(),
+ sBlockInfo.getDataBlockNum(), si.getBlockIndex());
+ Block blk = new Block(blockId, numBytes,
+ blockInfo.getGenerationStamp());
{code}
> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -----------------------------------------------------------------------------
>
> Key: HDFS-11193
> URL: https://issues.apache.org/jira/browse/HDFS-11193
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: HDFS-11193-HDFS-10285-00.patch,
> HDFS-11193-HDFS-10285-01.patch, HDFS-11193-HDFS-10285-02.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}.
> {{HdfsAdmin#satisfyStoragePolicy}} API call on a directory should consider
> all immediate files under that directory and need to check that, the files
> really matching with namespace storage policy. All the mismatched striped
> blocks should be chosen for block movement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]