[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Uma Maheswara Rao G (JIRA) Mon, 04 Dec 2017 13:27:27 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277538#comment-16277538
 ]


Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

[~andrew.wang] and [~anu], Thanks for the early discussions than later.

>From the Andrew’s points, a very valid point to address is operational cost 
>due to additional process. 
When we had offline discuss with Hbase folks, even HBase team expressing 
similar feeling that, they don’t like to start additional process. It will be 
great if others also comment on this aspect. Is it good decision to add new 
processes in HDFS?
In my summary mail I said, “I think its a reasonable compromise” - I am holding 
that, until we hear more opinions on that, whether its a “reasonable compromise 
or not”

>From Anu’s comments: Since balancer/disk balancers able to get enough 
>information from name node, why can’t SPS run. He is also making a point that, 
>SPS communicates to NN just like an any other application. 

I also learned from discussion, that community is more interested on additional 
new process from HDFS-6382 . I am not sure, that is true for all cases. I 
agree, things/opinions can change based on experiences. :-)

I have another point here to discuss: When SPS running outside, it will 
communicate to NN like an application, so, this is like increasing RPC load on 
a busiest NN? This turns to another level of throttling from SPS to control ? 

{quote}
I don't know if there are design documents on these topics yet, I have gleaned 
most of this from conversations with other contributors.
{quote}
Per my knowledge, following are the features trying to depend on SPS
# 
https://issues.apache.org/jira/secure/attachment/12875795/HDFS-12090-design.001.pdf
# HDFS-7343

But they both should work irrespective of whether the SPS runs inside NN or 
outside NN, Until SPS function properly as expected. All they need is a java 
API for moving the file blocks. I think Anu’s point here is, SSM kind of logics 
can get into SPS process itself in future. That could be long term plan, 
nothing has planned right now IIKC.

How do we progress further? Do we need to having online meeting to get feaster 
consensus on this?




> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Reply via email to