[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)

Erik Krogen (JIRA) Wed, 27 Mar 2019 15:53:55 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803428#comment-16803428
 ]


Erik Krogen commented on HDFS-12345:
------------------------------------

Hey [~daryn], thanks for raising some valid concerns.

Regarding the use of private APIs, they are pretty limited. On the YARN side, 
there's a few now within the ApplicationMaster simply because it is copied from 
the example of DistributedShell which also makes use of such private APIs; I'm 
happy to fix those before pushing. On the HDFS side, I believe the only use of 
private APIs is within the {{SimulatedDataNodes}} class, which does some 
manipulation to inject blocks into a {{MiniDFSCluster}}. Given that 
{{DataNodeCluster}} does something very similar, I would expect any changes in 
this area to be handled very similarly and not be much extra maintenance 
burden. If there is a desire, I can probably combine {{SimulatedDataNodes}} and 
{{DataNodeCluster}} so that any maintenance would only be in one place; IIRC, 
the main reason I did not do so already is that besides the actual block 
injection, most code within {{DataNodeCluster}} is devoted to deciding how to 
generate all of the blocks, whereas {{SimulatedDataNodes}} reads its blocks 
from an already-generated listing. But I can look into combining them.

Regarding maintenance burden, I guess most of the tools within {{hadoop-tools}} 
are "non-essential" maintenance burdens. I think [~smeng]'s idea of disabling 
compilation by default is interesting, but I worry that this would increase the 
chance that it falls out of sync with the main project and becomes even harder 
to maintain. I would definitely be open to feedback in this area.

> Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)
> --------------------------------------------------------------------------
>
>                 Key: HDFS-12345
>                 URL: https://issues.apache.org/jira/browse/HDFS-12345
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode, test
>            Reporter: Zhe Zhang
>            Assignee: Siyao Meng
>            Priority: Major
>         Attachments: HDFS-12345.000.patch, HDFS-12345.001.patch, 
> HDFS-12345.002.patch, HDFS-12345.003.patch, HDFS-12345.004.patch, 
> HDFS-12345.005.patch
>
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%[email protected]%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)

Reply via email to