[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300386#comment-16300386
 ] 

Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

*MEETING NOTES*
We had a meeting on 20 Dec 2017, mainly with the members who involved in the 
discussions in SPS JIRA
 
Attendees: Anu, Eddy, ATM, Chris, Andrew, Vinay, Rakesh, Uma
 
We had long discussions on the options available, specifically
 
 # Starting SPS within NN
 # Starting SPS as separate Service
 # Hybrid: SPS inside NN like RM and have fancier policies outside
 # Modularize SPS: SPS talks to NN via interface and make possibility of SPS to 
pull out easily while keeping SPS inside NN as one option.
 
 After all discussions (most of the points iterated from this JIRA comment 
arguments), for majority clusters starting SPS within NN may be sufficient, 
however, for larger clusters, it is reasonable argument to start SPS 
separately. One another motivation for start thinking to modularizing approach 
is, slowly we can bring other similar services from NN into outside SPS in 
future.
 
*So, as a conclusion, we thought we should have both options (SPS within NN and 
SPS as service) available.* One should be able to start SPS inside NN, no 
maintenance burden, Others should be able to start SPS as independent Service 
as well. The current implementation of SPS should serve as internal service and 
after refactoring, respective necessary code can be added to serve as 
Independent service. 
Thank you, Chris, for proposing this approach and thanks others for agreeing to 
it.
 
SPS should refactor to get clean interface between NN and SPS. Right now SPS, 
talks to NN protocol for SPS calls, keeping SPS as separate service options, it 
may be necessary to start SPS RPC with its own IP:port (within NN or outside), 
so clients can always talk to SPS on that port, irrespective of where its 
running. This will keep the API clean between these approaches.
 
When we start SPS outside, we should have security and HA: probably we will 
handle this post merge
 
High level tasks: 
# SPS refactoring to get into the modularized approach
# Starting SPS service on its own port (for within NN/ outside)
# Provide necessary plugin implementations to serve as independent service by 
not disturbing the provision to start inside NN.
# Add necessary start/stop scripts for SPS (only for SPS outside NN).
 
A separate document to be posted specifically for explaining the component 
interactions when started as a separate service. And Tasks will be prioritized 
for the merge. The current design doc holds good for internal SPS.
 
 
Finally, based on user interests, they can prefer to start SPS within NN or 
outside NN

@ attendees: please correct me if I miss any points covered in discussion.
 
*[~daryn] :  hope this is agreeable to you. Please feel free to comment if any 
concerns.*

Thank you all for the productive discussions. 

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to