[
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007247#comment-16007247
]
Virajith Jalaparti commented on HDFS-11639:
-------------------------------------------
Thanks for working on this [~ehiggs]. I think the general direction of the
patch (using {{BlockProvider#resolve()}} in {{ProvidedBlocksBuilder}} to get
the {{BlockAlias}}, passing it along as a set of bytes to the client, and
resolving it only on the Datanode) looks good.
A few comments:
* In {{ProvidedBlocksBuilder#newLocatedBlock}}, the {{fileRegion}} should be
resolved only if the block has {{PROVIDED}} locations (i.e.,
{{hasProvidedLocations}} is true). When {{dfs.namenode.provided.enabled}} is
set to true, all {{LocatedBlock}} are created in this method, and for
non-provided blocks, a resolution of {{BlockAlias}} is needed.
* {{PBHelperClient#convertLocatedBlockProto()}} and
{{PBHelperClient#convertLocatedBlock()}} should be modified to decode/encode
the {{BlockAlias}} bytes.
* How about decoding the {{blockAlias}} bytes in {{DataXceiver#readBlock}}
using a new {{DataTransferProtoUtil#blockAliasFromProto(bytes[] blockAlias}}
method instead of using the {{BlockAlias#builder()}}? The former will be
in-line with the way the protobufs are decoded in {{DataXceiver}}. Further, if
in the future a different {{BlockAlias}} is used, the current implementation of
using the {{FileRegion#Builder}} in {{DataXceiver#readBlock}} will be hard to
extend (will end up being try {{FileRegion#Builder}}, if null try
{{BlockAliasXX#Builder}} and so on).
* Similar to passing on BlockAlias from {{DataXceiver#readBlock}} to
{{BlockSender}}, it should be passed along from {{DataXceiver#readBlock}} to
{{BlockReceiver}}. However, we would not need it till we have writes
implemented.
* {{DataStreamer#blockAlias}} will never be non-null. I think it should be
initialized in {{DFSOutputStream}}.
* Current implementation of {{TextReader#resolve}} can be expensive as it has
to scan through the whole list of blocks. I don't have a very good solution for
this as one alternative would be to maintain an in-memory map to store this
information. I think it would be good to have both implementations, although
this is not a blocker.
* In {{DataTransfer#run()}}, the {{blockAlias}} should be null unless it is for
a provided block. I think this will entail adding BlockAlias to
{{transferBlock()}} and also to {{BlockCommand}} for the
{{DatanodeProtocol.DNA_TRANSFER}} command. However, this will only be relevant
for writing provided blocks (and in particular recovery).
* I don't think
{{TestDataTransferProtocol#testDataTransferProtocolWithBlockAlias}} is
currently correct/tests something useful. Shouldn't block alias be relevant for
only provided blocks?
Looking over this patch, one thing that occurred to me is if it makes sense to
unify {{FileRegionProvider}} with {{BlockProvider}}? They both have very close
functionality.
I like the use of {{BlockProvider#resolve()}}. If we unify
{{FileRegionProvider}} with {{BlockProvider}}, then {{resolve}} can return
{{null}} if the block map is accessible from the Datanodes also. If it is
accessible only from the Namenode, then a non-null value can be propagated to
the Datanode.
One of the motivations for adding the {{BlockAlias}} to the client protocol was
to have the blocks map only on the Namenode. In this scenario, the
{{ReplicaMap}} in {{FsDatasetImpl}} of will not have any replicas apriori.
Thus, one way to ensure that the {{FsDatasetImpl}} interface continues to
function as today is to create a {{FinalizedProvidedReplica}} in
{{FsDatasetImpl#getBlockInputStream}} when {{BlockAlias}} is not null.
> [READ] Encode the BlockAlias in the client protocol
> ---------------------------------------------------
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs
> Reporter: Ewan Higgs
> Assignee: Virajith Jalaparti
> Attachments: HDFS-11639-HDFS-9806.001.patch,
> HDFS-11639-HDFS-9806.002.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which
> encodes information about where the data comes from. i.e. URI, offset,
> length, and nonce value. This data should be encoded in the protocol
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is
> available using the PROVIDED storage type.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]