[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279113#comment-16279113
 ] 

Daryn Sharp commented on HDFS-10285:
------------------------------------

I lost my initial cursory notes on bugs in this patch.  I was immediately 
pondering if this feature really did belong in the NN but was going to overlook 
it for initial review.  I'm glad others raised the question.

My preference is this feature, like all scan features, should be outside the 
NN.  Integrated functionality is arguably more user-friendly but it comes with 
its own costs.  Namely increased complexity and maintenance.  It's yet another 
feature to accommodate in future core features.

There are many basic issues with integrated scan features.  Like truly being 
able to reconfigure on the fly.  Capability to run on a precisely scheduled 
basis.  Likewise being able to immediately and definitively kill it if it's 
causing problems or the cluster is under unusual distress.  Or being able to 
iteratively test new versions w/o bouncing the standby with the new version, 
failing over, failing back if not working as intended.  An adjunct service does 
not have these issues.  

That said: _the cited issues with the balancer are actually a plus for me_.  I 
don't love the balancer itself but I love it being a separate service.

I need to see exactly what sort of rpc calls would be necessary for it to be 
feasible as a separate service.  So long as it's a cheap read load, the NN can 
handle at least 80k ops/sec (audit logged), upwards of 300k ops/sec (if not 
audit logged).

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to