[
https://issues.apache.org/jira/browse/HADOOP-14036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HADOOP-14036:
-----------------------------------
Attachment: HADOOP-14036-HADOOP-13345.000.patch
Not proposing this for inclusion just yet (although it's possible this is
precisely the correct solution), but just a proof-of-concept of the problem. I
see paths getting added to the containers of objects to move here in the loop
I'm modifying, and then also down below the comment, "We moved all the
children, now move the top-level dir."
I should dig a bit into the listObjects call, as I'm curious why we don't have
this problem with a lot more tests / workloads that involve renames. I'm also
not entirely sure we do actually have to move the top-level dir last (although
my current fix ensures that it is added last). If the move isn't atomic, the
invariant that parent paths always exist is going to be violated for either the
new path or the old path sometime, and this particular operation is just adding
it to the collection to be broken into batches. Seems cleaner IMO to do it last
like we do, but I want to think through it a bit more. Speak up if you have any
insight or opinions there...
> S3Guard: intermittent duplicate item keys failure
> -------------------------------------------------
>
> Key: HADOOP-14036
> URL: https://issues.apache.org/jira/browse/HADOOP-14036
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Aaron Fabbri
> Assignee: Mingliang Liu
> Attachments: HADOOP-14036-HADOOP-13345.000.patch
>
>
> I see this occasionally when running integration tests with -Ds3guard
> -Ddynamo:
> {noformat}
> testRenameToDirWithSamePrefixAllowed(org.apache.hadoop.fs.s3a.ITestS3AFileSystemContract)
> Time elapsed: 2.756 sec <<< ERROR!
> org.apache.hadoop.fs.s3a.AWSServiceIOException: move:
> com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: Provided
> list of item keys contains duplicates (Service: AmazonDynamoDBv2; Status
> Code: 400; Error Code: ValidationException; Request ID:
> QSBVQV69279UGOB4AJ4NO9Q86VVV4KQNSO5AEMVJF66Q9ASUAAJG): Provided list of item
> keys contains duplicates (Service: AmazonDynamoDBv2; Status Code: 400; Error
> Code: ValidationException; Request ID:
> QSBVQV69279UGOB4AJ4NO9Q86VVV4KQNSO5AEMVJF66Q9ASUAAJG)
> at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178)
> at
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:408)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:869)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:662)
> at
> org.apache.hadoop.fs.FileSystemContractBaseTest.rename(FileSystemContractBaseTest.java:525)
> at
> org.apache.hadoop.fs.FileSystemContractBaseTest.testRenameToDirWithSamePrefixAllowed(FileSystemContractBaseTest.java:669)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]