[
https://issues.apache.org/jira/browse/HADOOP-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HADOOP-14215:
-----------------------------------
Attachment: HADOOP-14215-HADOOP-13345.002.patch
Attaching a patch with the test added. The test borrows heavily from how
ITestS3AConcurrentOps ended up doing things.
The characteristics of the test seem to vary wildly, but I think I'm almost
satisfied. I'm going to do a bunch more test runs just to be more certain. Last
night I was able to reproduce the problem in 1 or 2 threads in each iteration
within 1 or 2 iterations. This morning I had a bunch of test runs that couldn't
reproduce the problem. This afternoon I saw all but 1 thread (sometimes all
threads - which was weird - see below) in every single iteration experience the
error.
Immediately after getting a bunch of test runs that had the problem very
severely, I applied your patch, and the results started being complete success
75% of the time, and once in a while a single thread had a failure. I then
realized I was missing your retry-until-version-marker-is-present fix. I added
that and I haven't had a failure since. I mention all that detail because I'm
afraid I didn't think to really dig into the logs on the occasional cases when
not a single thread failed, so I'm only guessing that those were the cases that
didn't hit the UPDATING state, but that did lose the race condition with the
version marker.
So again, I'm going to do a bunch more test runs to be sure, but I think things
are looking good. +1 to your changes based on a visual code review, too.
> DynamoDB client should waitForActive on existing tables
> -------------------------------------------------------
>
> Key: HADOOP-14215
> URL: https://issues.apache.org/jira/browse/HADOOP-14215
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Priority: Critical
> Attachments: HADOOP-14215-HADOOP-13345.000.patch,
> HADOOP-14215-HADOOP-13345.001.patch, HADOOP-14215-HADOOP-13345.002.patch
>
>
> I saw a case where 2 separate applications tried to use the same
> non-pre-existing table with table.create = true at about the same time. One
> failed with a ResourceInUse exception. If a table does not exist, we attempt
> to create it and then wait for it to enter the active state. If another jumps
> in in the middle of that, the table may exist, thus bypassing our call to
> waitForActive(), and then try to use the table immediately.
> While we're at it, let's also make sure that the race condition where a table
> might get created between checking if it exists and attempting to create it
> is handled gracefully.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]