[ 
https://issues.apache.org/jira/browse/HADOOP-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574960#comment-13574960
 ] 

Andrey Klochkov commented on HADOOP-9287:
-----------------------------------------

By coincidence I've been working on this recently. As Chris is pointing out, 
just turning on parallel testing in Surefire would lead to various problems as 
current tests are not ready to be used this way. So what I did is fixed 
hadoop-common-project/hadoop-common and hadoop-hdfs-project/hadoop-hdfs to 
allow such execution, and the results seem positive.

The amount of changes required to remove contention among tests is not small, 
but the changes are straightforward. Parallel execution may be turned on by 
activating profile "parallel-tests". Number of forks to use may be tuned using 
-DtestsThreadCount (4 is the default). 

Most of changes in hadoop-common are related to FileContextTestHelper and 
FileSystemTestHelper -- some static methods are transformed into instance 
methods, to make tests use different directories by default. Tests which depend 
on these classes are changed accordingly.

Most of changes in hadoop-hdfs are related to MiniDFSCluster. Earlier, most of 
tests used the same dir to place MiniDFSCluster data. The modifications make 
every MiniDFSCluster instance to use a new dir (by default). When several 
instances need to use the same dir, it needs to be set explicitly using 
MiniDFSCluster.Builds.dfsBaseDir(dfsBaseDir). 

As I know MiniDFSCluster is used in other projects like HBase so changing it's 
API and default behavior may lead to issues there. So I left all existing 
methods intact, marking some of them as deprecated, and introduced an 
environment var which switches new behavior on, and by default the old single 
dir behavior is active. 

Currently it takes 7min to run hadoop-common tests with 4 parallel forks on my 
4core laptop, vs 15min in sequential mode. For hdfs it's 42min vs 1hr 39min. It 
may give even a bigger improvement if used on a CI node with many cores. 

I'm still in process of testing this. In particular, I'm going to verify 
projects which depend on Mini cluster infrastructure like HBase, Pig and Hive.

My existing patch is for both hadoop-common and hadoop-hdfs. The tests in these 
modules are coupled and changing one without changing the other wouldn't work. 
Tsuyoshi, do you mind if I change the title of this task adding HDFS and 
reassign it to myself? 

                
> Parallel testing hadoop-auth and hadoop-common
> ----------------------------------------------
>
>                 Key: HADOOP-9287
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9287
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: HADOOP-9287.1.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to