[ https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361119#comment-15361119 ]
Vinayakumar B commented on HDFS-9411: ------------------------------------- Thanks for taking look [~drankye] bq. This sounds like storage policy? How about rename? Rename of a file/directory also will carry original label expression. This could be done by storing the inherited label expression on file/directory being renamed. bq. Is there any means to specify a label or label expression is STRICT or not (OPTIONAL)? As I mentioned, STRICT is for the initial development, which is not optional. Later different modes could be supported. bq. A minor, I thought you may mean, "So to remove a label, admin can ..." Thanks for the find, will fix in next rev. bq. This sounds good. Such label spec would be good to be in common side so HDS and YARN can share it consistently. I thought about it initially. Bringing both code to common may need little more changes as YARN node-labels are already part of releases. As of now, looking to keep the user-faced API/commands and behavior in sync with YARN. bq. I'm not sure how it's done in YARN, maybe a property file in datanode letting admin list the labels there? Some labels like arch, OS can be automatically detected or discovered while datanode starting. I'm thinking about how to make labels easy to configure and use. AFAIK, Yarn also uses admin commands to specify the labels to nodes and then it RM stores in node-storage, which is persisted. But in HDFS, unlike YARN, nothing related to Nodes are persisted in NN. Everything will be dynamically built. Unlike Nodemanagers, datanodes involve persisted user-data, Its better to be able to specify only via-admin commands. bq. From HDFS perspective this sounds pretty good, and my overall suggestion would be, define and make the basic node label support in common side, in order to: 1) generic node label isn't essentially specific to HDFS, though some labels are. 2) shared by both HDFS and YARN in future, so admin may save some work, for example, using some common means admin can just specify all the labels for a node in a time, for both YARN and HDFS. 3) consistent in logic and behavior. Roughly, a job for a tenant should be scheduled to the datanodes where the input data reside for locality. 4) broad discussion to involve YARN guys. I understand it's not easy to split, but would be good to think about it. Thanks. Thank you. I know it would be good to be generic and make it common. I think, for current features too, in admin's point of view there are less things made common between HDFS and YARN. For ex: Underlying disks might be same for both HDFS and YARN, both needs to be configured in different configurations. And morever,I feel refactoring of already available yarn-nodelabel would be risky. May be this combining and refactoring can be taken later? > HDFS NodeLabel support > ---------------------- > > Key: HDFS-9411 > URL: https://issues.apache.org/jira/browse/HDFS-9411 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Vinayakumar B > Assignee: Vinayakumar B > Attachments: HDFSNodeLabels-20-06-2016.pdf, > HDFS_ZoneLabels-16112015.pdf > > > HDFS currently stores data blocks on different datanodes chosen by > BlockPlacement Policy. These datanodes are random within the > scope(local-rack/different-rack/nodegroup) of network topology. > In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant > can be on any datanodes. > Based on applications of different tenant, sometimes datanode might get busy > making the other tenant's application to slow down. It would be better if > admin's have a provision to logically divide the cluster among multi-tenants. > NodeLabels adds more options to user to specify constraints to select > specific nodes with specific requirements. > High level design doc to follow soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org