[
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015401#comment-15015401
]
Mark Sun commented on HDFS-9411:
--------------------------------
Hi [~vinayrpet], sorry for late reply
* *About use DataNode configuration or not.*
** I mixed up StorageType and DataNode StorageInfo, I means maybe DataNode
can storage zone_label into VERSION file of each Volume, or another file in
same directories. Sure, if then we have to add new command to heartbeat.
** Besides consideration we already discussed, set zone_label on Volume
could be better in case of Volume level isolation in the future.
** zone_label updating is not like simple configuration applying, it is
mostly looks like refreshNodes:
*** Both NameNode and DataNode involved.
*** In same case, we should ensure necessary works(block moving or
erasuring) done before label finally changes.
* *About BPP related API.*
** Your comments makes me a litter bit clear:
*** Client do not set xattr explicitly, but ZoneProvider in NameNode will
read policy from user-context and zone-stats, and then set zone layout as xattr
to file.
*** Use set/getZone API & CLI to change zone layout explicitly after file
created, and new file/subdirs can inherit zone layout from its parent.
** Still not clear:
*** Where does NameNode get ZoneProvider implementation of certain user?
*** If NameNode stores zone layout to xattr, how to deal with zoneStats
change? How to distinguish between “we want to store block in zone_a” and “we
want to store block in zone_b, but zone_b is full according to zoneStats, so we
use zone_a”?
*** How to deal with large files? zoneStats always changes, will we
update zone layout with new zoneStats for later blocks?
** Proposition
*** We do not store xattr by NameNode, just re-calculate for each block
allocation.
*** If we want per-file policy, we should store **the policy itself** to
xattr, not the layout result, due to zoneStats changes.
*** Maybe we need to pass&store policies to DataNode also, for new zone
based Balancer or Mover.
* *About feature on-off control and DEFAULT_ZONE*
** Agree with you, it’s just a philosophy level tendency. I just think
DEFAULT_ZONE can make API uniform with less if/else, and by default, every node
belongs to DEFAULT_ZONE, and the code path will be same to current
implementation.
** Or another option, we may use polymorphism to reduce if/else and code
complexity caused by on-off control.
** Sure, both are OK :)
* *Make zone transparent to existing BPPs*
** I proposed expand zone to node list to make zone transparent to BPPs, for:
*** Code is more generic, the new constraint is BPP should put X replicas
to set<node>, but doesn’t restrict this constraint should be made by
ZoneProvider. (We can even use this constraint to re-implementation
local-node-choosing )
*** More flexible, in some ambident scenario, one replica can be placed
to both zone_a and zone_b.
*** Code is loosely-coupled, BPP code modification don’t required to know
zone_label.
> HDFS ZoneLabel support
> ----------------------
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Attachments: HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by
> BlockPlacement Policy. These datanodes are random within the
> scope(local-rack/different-rack/nodegroup) of network topology.
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant
> can be on any datanodes.
> Based on applications of different tenant, sometimes datanode might get busy
> making the other tenant's application to slow down. It would be better if
> admin's have a provision to logically divide the cluster among multi-tenants.
> ZONE_LABELS can logically divide the cluster datanodes into multiple Zones.
> High level design doc to follow soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)