[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517906#comment-16517906
 ] 

Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

Hi All,

After all long discussions offline, I would like to summarize the current state 
of arguments/approaches.

>From [~andrew.wang]: Interested to to this process inside NN to reduce 
>maintenance cost etc. He also agreed to have this running optionally outside.

>From [~anu]: He has no interest to run the process inside NN and in-fact he 
>was the one who proposed to start this process outside NN. We worked so far to 
>satisfy both arguments. In Offline discussion today, Anu proposing to go ahead 
>with merge using existing workable external SPS part in first phase and we 
>continue improve the feature as alternatives proposed. This feature can be 
>Alpha. 

>From [~chris.douglas]: He is fine with both options and he proposed context 
>based abstractions what we agreed and implemented so far.

>From [~daryn]: He is fine with running this process outside. If we want run 
>this internal to NN, he proposed to couple with RM instead of keeping 
>logics/queues in separate daemon thread.

>From Uma: Interested primarily to run with NN and have no major concerns to 
>start as separate process to move forward on the project.

>From [~rakeshr]: He has no concerns on either way as users can run depending 
>on their usage model.

Here I am trying to propose that the current code supports both options. But 
internal to NN is not depending on RM in current code base. So, how about we 
move forward with External SPS option for merge and continue discussing the 
internal SPS? Because internal SPS takes time to integrate with RM and testing 
etc and also definitely there may not be a major common code. While we will 
continue discussion on Internal SPS, and no concerns in External SPS, we could 
move forward with External SPS to merge? If that works we will make necessary 
cleanups and go for external SPS merge? However we will not mark this feature 
as Stable until we run this for some time. So, It should be ok to keep improve 
the code part incrementally instead nothing and blocked each other on 
arguments. Thanks

 

 

 

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to