[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Daryn Sharp (JIRA) Thu, 07 Dec 2017 12:06:12 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282419#comment-16282419
 ]


Daryn Sharp commented on HDFS-10285:
------------------------------------

bq. Im coming at this from the standpoint of supporting Cloudera's Hadoop 
customers.  \[…\] the average Hadoop admin who wants this to be turnkey

If a cluster is a magical turnkey, it’s just an implementation detail whether 
it’s a monolithic service or collection of managed services.  I understand the 
support burden of telling customers “run this, run that”, but isn’t that a 
deficiency of Ambari, Cloudera Manager, etc? 

Here's a rhetorical question:  If managing multiple services is hard, why not 
bundle oozie, spark, storm, sqoop, kafka, ranger, knox, hive server, etc in the 
same process?  Or ZK so HA is easier to deploy/manage?

bq. For a large, sophisticated Hadoop user like Yahoo, it may not be a big cost 
to deploy a new service, but in relative terms a much bigger cost for a small 
user. 

Cluster upgrades are formal.  It can take weeks or months to reach critical 
production clusters.  If a scale level bug is found near the end of the runway, 
it’s hard to short-circuit the restarting and rescheduling the entire runway.  
On the other hand, an adjunct service like the kms has an extremely short 
runway.  The balancer, being generally non-critical, has a lot of leeway and 
when necessary can be tinkered on w/o a deployment.

Tech support asking a user to start a process costs less?

Anyway, fast review.

*Locking*
Yesterday, I was going to say I'm not overly worried with locking other than 
correctness and doesn't impact whether it should be in or out of the NN.

Today, I looked at the code more closely.  It can hold the lock (read lock, but 
still) way too long.  Notably, but not limited to, _you can’t hold the lock 
while doing block placement_.

Being in the NN makes it too easy to abuse the lock in subtle ways.

*Memory*
Bounded queues are not a panacea for memory concerns.  I’m more concerned with 
GC issues.  Throttling via queues is going to result in promotion to oldgen 
where collection is much more expensive.

The memory estimate is narrowly focused and assumes a 32-bit jvm.  It omits the 
all ancillary heavyweight data structures, futures, etc.

*CPU*
Yesterday, not too worried based on misconception that very little locking is 
occurring.

Today, I see there’s an incredible amount of computation occurring which often 
appears to be within the fsn lock.  There’s a lot of garbage generation which 
invisibly saps cpu too.

*Other*
bq. This feature is switched OFF by default and no impact to HDFS.
I should start sending bills to everyone who makes this fraudulent claim. :).  
{{FSDirectory#addToInodeMap}} imposes a nontrivial performance penalty even 
when SPS is not enabled.  We had to hack out the similar EZ check because it 
had a noticeable performance impact esp. on startup.  However now that we 
support EZ, I need to revisit optimizing it.  

There’s likely more performance hits if I looked harder at where it’s spliced 
in.

bq. NN has existing feature EDEK which also does scanning and we reuses the 
same code in SPS.
Yes, and I’m not very happy about that feature’s implementation but it was 
jammed in.

––

I’m torn on this issue.  I think the HSM experience is lackluster and needs to 
be improved.  I haven’t looked at the Mover so no idea how well it works or 
doesn’t work.  If it works ok, then perhaps it should have an rpc service to 
poke something at the front of the queue for those that don’t want to wait like 
hbase.

If it’s an internal service, I’d rather it work in a dumbed-down background 
fashion.  Otherwise it’s going to be a real problem as it becomes too smart and 
bloated.  I’m curious why it isn’t just part of the standard replication 
monitoring.  If the DN is told to replicate to itself, it just does the storage 
movement.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

Reply via email to