[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279550#comment-16279550
 ] 

Andrew Wang commented on HDFS-10285:
------------------------------------

Thanks for chiming in Daryn,

bq. My preference is this feature, like all scan features, should be outside 
the NN. Integrated functionality is arguably more user-friendly but it comes 
with its own costs. Namely increased complexity and maintenance. It's yet 
another feature to accommodate in future core features.

Keeping it in the NameNode is easier from a deployment standpoint. It's 
arguable whether this benefit is more important than the benefits to making it 
separate.

I'm coming at this from the standpoint of supporting Cloudera's Hadoop 
customers. For a large, sophisticated Hadoop user like Yahoo, it may not be a 
big cost to deploy a new service, but in relative terms a much bigger cost for 
a small user. Being able to reach in and kill a rogue process or iteratively 
test new versions is great when you're a power user, but not for the average 
Hadoop admin who wants this to be turnkey. You'd be amazed at the cluster-write 
support tickets we've resolved by saying "run the balancer", just because it 
doesn't run automatically. I've fielded similar questions about HSM that were 
answered by "run the mover". It's the first thing users trip over.

Replying to the other concerns, we already have mechanisms for reconfiguring 
the NN so I don't see that as an inherent limitation. Running on a precisely 
scheduled basis also doesn't seem inherent, and also isn't what Anu was 
proposing since the SPS would still be triggered by a NN RPC, not by cron or 
something.

Finally, the SPS is off by default, and pretty safe since the new code sits 
separate from the rest of the NN paths. There's also already a separate mover 
command which runs like the balancer, for users who prefer that.

Are there still outstanding concerns with merging this? Uma proposed a call 
above, and I think that's the next step if we still need to reach consensus.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to