[
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322219#comment-16322219
]
Steve Loughran commented on HADOOP-15079:
-----------------------------------------
bq. Looks gift horse in the mouth "How did these unrelated annotation changes
get in here
Because I went through the entire call chain of what rename was up to and what
their policy was, when things were unmarked their policy was reviewed &
updated. As noted, it's some s3guard translation which is what we need to wrap
up (straightforward).
bq. us I think you've changed listLocatedStatus() to Mixed.
OK, I thought it was superfluous, but I will (a) trace it throough and (b)
reinstate anyway. (Done)
Also: added annotations for Once/Retry ExceptionsSwallowed, and tagged where
this is done in the dir/finishWrite calls.
I can see this becoming a maintenance PITA, but it does force us to be
consistent. I wonder if you could do some tool which
analysed transient use of invoked methods for consistency...but you'd need to
look at try/catch logic too. Hard.
bq. Do you want an {{if (outcome) }} condition around maybeCreateFakeParent()?
good idea. Done
bq. . Now that finishedWrite() does S3Guard.addAncestors() (IIRC it did not
originally), maybe we don't need this and the whole makeDirsOrdered() thing can
go away. A FS with real directories would still probably want to use that
function but .. it'll always be in git history. I'm fine with trying it out
here, or doing a separate JIRA, 'sup to you.
Let's do it here, but then we'll need the extra review of that. Note that
removing it reduces the IOPs used, helping keep DDB costs down: tangible
benefits
bq. Wow, good catch. Looks like that has been there since S3A was committed.
We just have to accept that sometimes, code you consider stable turns out to
have bugs of some kind or other. Like, say, CPU parts :)
This is something we can/should backport on its own, as it is adding time &
cost to every rename. Once this is in I'll do a micro-backport of the != fix,
not worrying about test coverage.
bq. Your tests sharing a path instance implies the you were renaming from root
dir to root dir (the only case Path#getParent() doesn't allocate a new Path).
Sound right?
Actually, I don't think we have a test for that anywhere; there's nothing in
abstract contract root directory test. Should be though, really. I just assumed
that the mismatch was part of the cause of extra delete calls --maybe it was
just unrelated but while doing the due diligence I noticed it in the debugger.
* Added a new test {{ITestS3AFileOperationCost#testCostOfRootRename()}} to see
what goes on there. Not a lot, thankfully.
* Given this was the first time we've looked at that test suite for a while,
did a quick cleanup, moving to LTU.intercept() where appropriate, and having
the entire suite skipped during fault injection; too much risk of false test
failures.
Also, we don't need to skip that test with s3guard on. now know things are
fixed.
w.r.t {{makeDirsOrdered}} in innerMkDirs: I've cut it, everything seems good.
But I've left the method in S3Guard for now, because it discusses issues there
which are interesting. Maybe
h3. Checkstyle
Fixed important, cut comment too long, and also moved innerRename to
maybeAddTrailingSlash where appropriate...that keeps the line count down there.
h3. Tests
Tested {{ITestS3AFileOperationCost}} the with s3guard, s3guard + auth, s3guard
+ dynamo, s3guard + dynamo + auth. The set.
Did full suite test run with: {{-Dparallel-tests -DtestsThreadCount=12
-Ds3guard -Dauth -Dscale}}
I did *not* see the problems that Sean saw, which I can't even point to a cause
of, especially the {{AbstractS3GuardToolTestBase.run:95 ยป IllegalArgument}}
cases.
Sean, can you do a clean build & test & stick up stack traces?
> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after
> OutputCommitter patch
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.1.0
> Reporter: Sean Mackrory
> Assignee: Steve Loughran
> Priority: Critical
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but
> was:<2>". I printed stack traces whenever this metric was incremented, and
> found the root cause to be that innerMkdirs is now causing two calls to
> delete fake directories when it previously caused only one. It is called once
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
> at
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2025)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]