[
https://issues.apache.org/jira/browse/HADOOP-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663101#comment-13663101
]
Chris Nauroth commented on HADOOP-9287:
---------------------------------------
{quote}
As part of this effort, it would be good to enumerate patterns that can cause
concurrent tests to fail
{quote}
This sounds like good material for the code review checklist.
http://wiki.apache.org/hadoop/CodeReviewChecklist
{quote}
Instead of changing individual tests to use unique test folder paths, couldn't
we just reconfigure test.build.data from the outside (from maven)?
{quote}
This would be very convenient, but unfortunately, I can't think of a way to
make it work. The problem is that our pom.xml code hands over control to
maven-surefire-plugin, which then iterates through each test suite class and
executes them. When execution enters maven-surefire-plugin, the Maven
properties are frozen at a specific state. I don't believe there is any way
for our pom.xml code to take back control from maven-surefire-plugin between
test suite iterations to generate a different unique ID. Maybe a custom JUnit
runner could do it? At that point, it might be more trouble than it's worth.
Does anyone else have ideas on this? I'm also not aware of any built-in unique
ID property or external plugins that generate unique IDs, so we might end up
needing to code another custom plugin of our own.
{quote}
Chris Nauroth, have you had a chance to kick the tires on this patch for
Windows?
{quote}
Results look good so far. First, I ran the tests without the parallel-tests
profile enabled. As expected, this caused no harm to the test results on
Windows. That's a great sign!
Next, I enabled parallel-tests with the default thread count of 4. Performance
improvement was similar to what is reported here: from ~15 minutes down to ~8
minutes, and this is on a fairly wimpy VM. I did see some new failures though:
# There were failures due to test timeouts in TestCopyPreserveFlag
(testPutWithP, testPutWithoutP, testGetWithP, testGetWithoutP), and
TestLocalFileSystem (testWorkingDirectory, testCopy). These all have very
short timeouts (1s). I suspect that multi-threaded execution introduced a bit
of context-switching overhead that just barely pushed it over the timeout. I
recommend increasing these timeouts to 10s. Unfortunately, this suggests that
timeout settings + parallel execution could be another source of flaky test
results in the future.
# {{TestTFileNoneCodecsByteArrays#testFailureNegativeLength_3}} failed with an
EOFException, which makes me think that 2 tests tried to share a file or
directory and saw unexpected data. This inherits from a base class, and I see
that the code changes in the base class should have prevented a sharing
problem, but perhaps we missed something. I think we ought to investigate this
one before committing. It's probably not a Windows problem, but rather just a
coincidence that the problem manifested on a Windows machine.
[~aklochkov], thanks again for sticking with this issue and responding to the
feedback. This is going to be a big help for developer productivity. I got
pretty excited when the common tests finished so quickly on my machine! :-)
> Parallel testing hadoop-common
> ------------------------------
>
> Key: HADOOP-9287
> URL: https://issues.apache.org/jira/browse/HADOOP-9287
> Project: Hadoop Common
> Issue Type: Test
> Components: test
> Affects Versions: 3.0.0
> Reporter: Tsuyoshi OZAWA
> Assignee: Andrey Klochkov
> Attachments: HADOOP-9287.1.patch, HADOOP-9287--N3.patch,
> HADOOP-9287--N3.patch, HADOOP-9287--N4.patch, HADOOP-9287--N5.patch,
> HADOOP-9287.patch, HADOOP-9287.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the
> tests can be run more faster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira