[jira] [Comment Edited] (HDFS-10285) Storage Policy Satisfier in Namenode

Anu Engineer (JIRA) Fri, 01 Dec 2017 13:09:45 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274977#comment-16274977
 ]


Anu Engineer edited comment on HDFS-10285 at 12/1/17 9:08 PM:
--------------------------------------------------------------

bq. Is it trivial? I think we still need some type of fencing so there's only 
one active SPS. Does this use zookeeper, like NN HA? 
Yes, that would be the simplest approach to getting SPS HA.
bq. If there's an SPS failover, how does the new active know where to resume?
Once the active knows it is the leader, it can read the state from NN and 
continue. The issues of continuity are exactly same whether it is inside NN or 
outside.
bq.  I'm also wondering how progress is tracked, so we can resume without 
iterating over significant portions of the namespace.
As soon as a block is moved, the move call updates the status of the block 
move, that is NN is up to date with that info. Each time there is a call to SPS 
API, NN will keep track of it and the updates after move lets us filter the 
remaining blocks.
bq. I also like centralized control when it comes to coordinating block work. 
The NN schedules and prioritizes block work on the cluster. Already it's 
annoying to users to have configure a separate set of resource throttles for 
the balancer work, and it makes the system less reactive to cluster health 
events. We'd much rather have a single resource allocation for all cluster 
maintenance work, which the NN can use however it wants based on its priority.
By that argument, Balancer should be the first tool that move into the Namenode 
and then DiskBalancer. Right now, SPS approach follows what we are doing in 
HDFS world, that is block moves are achieved thru an async mechanism. If you 
would like to provide a generic block mover mechanism in Namenode and then port 
balancer and diskBalancer, you are most welcome. I will be glad to move SPS to 
that framework when we have it.

bq. What is the concern about NN overhead, for this feature in particular? This 
is similar to what I asked Uma earlier about the coordinator DN; I don't think 
it meaningfully shifts work off the NN
There are a couple of concerns:
 # Following an established pattern of Balancer, Mover, DiskBalancer etc. 
# Memory and CPU overhead in Namenode.
# Future Directions -- if we have to support more finer mechanisms like smart 
storage management, moving data into provided block etc. It is better for this 
to be run as an independent service.


And most important, we are just accelerating an SPS future work item, it has 
been a booked plan to make SPS separate, so we are just achieving that goal 
before the merge. Nothing fundamentally changes about SPS.


was (Author: anu):
bq. Is it trivial? I think we still need some type of fencing so there's only 
one active SPS. Does this use zookeeper, like NN HA? 
Yes, that would be the simplest approach to getting SPS HA.
bq. If there's an SPS failover, how does the new active know where to resume?
Once the active knows it is the leader, it can read the state from NN and 
continue. The issues of continuity are exactly same whether it is inside NN or 
outside.
bq.  I'm also wondering how progress is tracked, so we can resume without 
iterating over significant portions of the namespace.
As soon as a block is moved, the move call updates the status of the block 
move, that is NN is up to date with that info. Each time there is a call to SPS 
API, NN will keep track of it and the updates after move lets us filter the 
remaining blocks.
bq. I also like centralized control when it comes to coordinating block work. 
The NN schedules and prioritizes block work on the cluster. Already it's 
annoying to users to have configure a separate set of resource throttles for 
the balancer work, and it makes the system less reactive to cluster health 
events. We'd much rather have a single resource allocation for all cluster 
maintenance work, which the NN can use however it wants based on its priority.
By that argument, Balancer should be the first tool that move into the Namenode 
and then DiskBalancer. Right now, SPS approach follows what we are doing in 
HDFS world, that is block moves are achieved thru an async mechanism. If you 
would like to provide a generic block mover mechanism in Namenode and then port 
balancer and diskBalancer, you are most welcome. I will be glad to move SPS to 
that framework when we have it.

bq. What is the concern about NN overhead, for this feature in particular? This 
is similar to what I asked Uma earlier about the coordinator DN; I don't think 
it meaningfully shifts work off the NN
There are a couple of concerns:
 # Following an established pattern of Balancer, Mover, DiskBalancer etc. 
# Memory and CPU overhead in Namenode.
# Future Directions -- if we have to support more finer mechanisms like smart 
storage management, moving data into provided block etc. It is better for this 
to be run as an independent service.


And most important, we are just accelerating an SPS future work item, it has 
been a booked plan to make SPS separate, so we are just achieving that goal 
before the merge. Nothing fundamental changes about SPS.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-10285) Storage Policy Satisfier in Namenode

Reply via email to