[ 
https://issues.apache.org/jira/browse/HADOOP-19347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901225#comment-17901225
 ] 

Steve Loughran commented on HADOOP-19347:
-----------------------------------------

ull (sanitized) stack trace

{code}
  
ERROR : Job Commit failed with exception
 
'org.apache.hadoop.hive.ql.metadata.HiveException(org.apache.hadoop.fs.s3a.AWSS3IOException:
 rename 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000
 to 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000.moved
 on 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000:
 org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]: InternalError: 
table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/: We 
encountered an internal error. Please try again.
: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)])'
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.fs.s3a.AWSS3IOException: rename 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000
 to 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000.moved
 on 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000:
 org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]: InternalError: 
table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/: We 
encountered an internal error. Please try again.
: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1632)
        at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:797)
        at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:802)
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:703)
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:371)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356)
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329)
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:546)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:540)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.s3a.AWSS3IOException: rename 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000
 to 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000.moved
 on 
s3a://bucket/hive/table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000:
 org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]: InternalError: 
table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/: We 
encountered an internal error. Please try again.
: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]
        at 
org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException.translateException(MultiObjectDeleteException.java:101)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:347)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:163)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.removeSourceObjects(RenameOperation.java:623)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.completeActiveCopiesAndDeleteSources(RenameOperation.java:266)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.recursiveDirectoryRename(RenameOperation.java:456)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.execute(RenameOperation.java:291)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:2456)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$rename$6(S3AFileSystem.java:2305)
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2776)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:2303)
        at org.apache.hadoop.hive.ql.exec.Utilities.rename(Utilities.java:1174)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1551)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1617)
        ... 28 more
Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
        ... 43 more
{code}


> AWS SDK deleteObjects() and S3Store.deleteObjects() don't handle 500 failures 
> of individual objects
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19347
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19347
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Priority: Minor
>
> S3Store.deleteObjects() encountered 500 error and didn't recover.
> We normally assume that 500 errors are already retried by the SDK so our own 
> retry logic doesn't bother
> The root cause is that the 500 errors can surface within the bulk delete.
> * The delete POST returns 200, so SDK is happy
> * but one of the rows in the request is reports the S3Error "InternalError":
> {{Code=InternalError, Message=We encountered an internal error. Please try 
> again.)]}}
> Proposed.
> * bulk delete invoker must map "InternalError" to AWSStatus500Exception and 
> throw that.
> * Add a retry policy for bulk deletes which considers AWSStatus500Exception 
> as retriable. retry. We currently don't on the assumption that the SDK will 
> retry, which it does for base retries, but clearly not for multiobject delete.
> * Maybe also consider possibility that a partial 503 response could be 
> generated? that is: only part of the delete throttled?
> {code}
> Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
> [S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
>  Code=InternalError, Message=We encountered an internal error. Please try 
> again.)]
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558)
>       at 
> org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625)
>       at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
>       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to