[jira] [Updated] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)

Erik Krogen (JIRA) Tue, 15 Jan 2019 08:55:12 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erik Krogen updated HDFS-12345:
-------------------------------
    Status: Patch Available  (was: In Progress)

It took a while to finally get to this, but I'm happy to be attaching an 
initial stab at moving Dynamometer from our GitHub repository into Hadoop 
Tools! It builds, puts itself into the distribution, and the tests pass (at 
least locally... will let Jenkins see if it agrees).

This is based off of the [{{ekrogen-hadoop-3-support}} branch of 
Dynamometer|https://github.com/xkrogen/dynamometer/tree/ekrogen-hadoop-3-support],
 which is a patch on top of the master branch changing it to support Hadoop 3. 
I am thinking that a reasonable way forward may be to leave the GitHub repo as 
Hadoop 2 compatible, and keep the version within Tools for Hadoop 3+.

There are still some major outstanding tasks before this can be committed:
* The documentation hasn't been placed where it belongs to work with the site
* I'm not entirely confident the packaging strategy I've used, with an overall 
{{hadoop-dynamometer}} module containing the same three submodules as the 
GitHub repo, is the right approach. Comments are welcomed.
* The style doesn't match Hadoop (in particular, line length is higher -- lots 
of reformatting is going to need to be done)
* I'm not sure if system properties are properly passed as necessary for the 
tests
* I/we need to make a decision about version compatibility. Dynamometer was 
designed to be able to run multiple versions of Hadoop from a single 
Dynamometer release. Does this still make sense now that Dynamometer is within 
Hadoop itself? I think so, to accommodate scenarios where you have a cluster 
running Hadoop version X but you want to test out what an upgrade to Hadoop 
version Y might look like.

In addition to these blocker-items, I think there are a few tasks that are 
well-suited to follow-on tasks:
* Currently Dynamometer always downloads a Hadoop tarball to use for tests 
(caching it locally between runs), overridable by a system property. It seems 
like it should probably use the local build when possible.
* As Wei-Chiu mentioned above, we need proper unit testing (there is mostly one 
big integration test for now) and support for more features.

> Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)
> --------------------------------------------------------------------------
>
>                 Key: HDFS-12345
>                 URL: https://issues.apache.org/jira/browse/HDFS-12345
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode, test
>            Reporter: Zhe Zhang
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-12345.000.patch
>
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%[email protected]%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads (Dynamometer)

Reply via email to