[
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258879#comment-16258879
]
Rakesh R commented on HDFS-10285:
---------------------------------
Thanks a lot [~anu] for your time and comments.
bq. I have no problem with the feature, just that I think we should expose
whatever information you want from namenode and run as an independent tool. It
really is a tool like Balancer.
IIUC, Balancer doesn't need any input file paths and it does balancing HDFS
cluster based on the utilization. Balancer can run independently as it doesn't
take any input file path argument. On the other side, we have a Mover tool in
place which will run independently as a tool and takes input file path
argument. Here the pain point is, admin has to collect all file
paths(application can dynamically change the storage policy for a path) and
trigger Mover tool externally to satisfy the storage policy for that file. SPS
feature is an enhancement to the HSM mechanism where it can find the storage
policy mismatches for the user given path and satisfy the policy.
Basically, we have started this feature when HBase community shows interest
that if HDFS can programatically handle the file block movements so that they
can easily integrate into HBase via java API. HBase can invoke
{{dfs#satisfyStoragePolicy(path)}} after changing the storage policy for a path
then and there. This feature is switched OFF by default and no impact to HDFS.
Also, we have dynamic switch ON/OFF mechanism via reconfigure without
restarting NN. After enabling this feature, if somebody invokes the
{{dfs#satisfyStoragePolicy(path)}} API then only it will trigger the block
movements, otherwise SPS daemon will be completely silent. Again, this is not a
key flow in the Namenode. We have provided a throttling mechanism not to
overload Namenode and will not keep more data in memory.
Please [find HBase
usecase|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16120227&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16120227]
that has mentioned by [~anoop.hbase]. In general, the idea behind SPS feature
is to provide a mechanism to HBase kind of systems so that they can make use of
HSM feature efficiently.
> Storage Policy Satisfier in Namenode
> ------------------------------------
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Affects Versions: HDFS-10285
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch,
> HDFS-10285-consolidated-merge-patch-01.patch,
> HDFS-10285-consolidated-merge-patch-02.patch,
> HDFS-10285-consolidated-merge-patch-03.patch,
> HDFS-SPS-TestReport-20170708.pdf,
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf,
> Storage-Policy-Satisfier-in-HDFS-May10.pdf,
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These
> policies can be set on directory/file to specify the user preference, where
> to store the physical block. When user set the storage policy before writing
> data, then the blocks could take advantage of storage policy preferences and
> stores physical block accordingly.
> If user set the storage policy after writing and completing the file, then
> the blocks would have been written with default storage policy (nothing but
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such
> file names as a list. In some distributed system scenarios (ex: HBase) it
> would be difficult to collect all the files and run the tool as different
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage
> policy file (inherited policy from parent directory) to another storage
> policy effected directory, it will not copy inherited storage policy from
> source. So it will take effect from destination file/dir parent storage
> policy. This rename operation is just a metadata change in Namenode. The
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for
> admins from distributed nodes(ex: region servers) and running the Mover tool.
> Here the proposal is to provide an API from Namenode itself for trigger the
> storage policy satisfaction. A Daemon thread inside Namenode should track
> such calls and process to DN as movement commands.
> Will post the detailed design thoughts document soon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]