[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107874#comment-16107874
 ] 

Lei (Eddy) Xu commented on HDFS-10285:
--------------------------------------

Hey, [~umamaheswararao]. Thanks for the great work!

I have some nits questions: 

* Non-recursively set xattr.  Please kindly re-consider to use recursive async 
call.  If the use cases are mostly targeted to the downstream projects like 
HBase and etc., the chance of these projects mistakenly call 
{{satisfyStoragePolicy}} on wrong directory (i.e., "/") is rare, but it will 
make the projects to manage large / deep namespace difficult, i.e., hbase needs 
to iterate the namespace itself and calls the same amount of "setXattr" anyway 
(because the # of files to move is the same).  Similar to "rm -rf /", while it 
is bad that "rm" allows to do it, but IMO it should not prevent users / 
applications to use "rm -rf" in a sensible way. 

* The newly added {{public void removeXattr(long id, String xattrName)}}. While 
its name seems very generic, it seems only allow taking sps xattr as legit 
parameter. Should we demote it from public API in {{Namesystem}}?

* Would it make sense to have an admin command to unset SPS on a path? For an 
user to undo his own mistake. 

* {{FSNamesystem#satisfyStoragePolicy}}. Is this only setting xattr? Can we do 
the setting xattr part without SPS running? I was thinking the scenarios that:  
some downstream projects (i.e., hbase) start to routinely use this API,  while 
for some reason (i.e., mover is running or cluster misconfiguration), SPS is 
not running, should we still allow these projects to successfully call the 
{{satisfyStoragePolicy()}}, and allow SPS to catch up later on?    

* And since this call essentially triggers a large async background task, 
should we put some logs here? Similarly, it'd be nice to have related JMX stats 
and some indications in web UI, to be easier to integrate with other systems.




> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to