[
https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934024#comment-16934024
]
Siddharth Seth edited comment on HADOOP-16207 at 9/20/19 4:00 AM:
------------------------------------------------------------------
Also, to run the tests in parallel - the jobs need to start using a different
directory name. Currently, all of them use testMRJob (The method name in the
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the
MR tmp dir config property). YARN clusters should already be able to run in
parallel (different ports, random dir names, etc)
I'd also be careful trying to run too many of these in parallel, given the
amount of memory they consume. Maybe a different parallelism flag for any tests
running on a cluster?
How about simplifying the code and letting the tests reside in the same class,
which makes the code easier to read and allows sharing a cluster more easily.
Haven't seen the WIP patch - but sharing a cluster across different tests,
which may or may not trigger at the same time seems like it may cause problems.
The tests also use a 1 s sleep for the InconsistentFS to get into a consistent
state. That can lead to flakiness in the tests. A higher sleep, unless
InconsistentFS can be set up with an actual waitForConsistency method which is
not time based.
was (Author: sseth):
Also, to run the tests in parallel - the jobs need to start using a different
directory name. Currently, all of them use testMRJob (The method name in the
common class that all tests inherit from).
The issue with the local dir conflict is a MR configuration afaik (Likely the
MR tmp dir config property). YARN clusters should already be able to run in
parallel (different ports, random dir names, etc)
I'd also be careful trying to run too many of these in parallel, given the
amount of memory they consume. Maybe a different parallelism flag for any tests
running on a cluster?
> Fix ITestDirectoryCommitMRJob.testMRJob
> ---------------------------------------
>
> Key: HADOOP-16207
> URL: https://issues.apache.org/jira/browse/HADOOP-16207
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, test
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Critical
>
> Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of
> HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run:
> Path "is recorded as deleted by S3Guard"
> {code}
> waitForConsistency();
> assertIsDirectory(outputPath) /* here */
> {code}
> The file is there but there's a tombstone. Possibilities
> * some race condition with another test
> * tombstones aren't timing out
> * committers aren't creating that base dir in a way which cleans up S3Guard's
> tombstones.
> Remember: we do have to delete that dest dir before the committer runs unless
> overwrite==true, so at the start of the run there will be a tombstone. It
> should be overwritten by a success.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]