Sarunas Valaskevicius created HADOOP-19479: ----------------------------------------------
Summary: S3A Deadlock multipart upload Key: HADOOP-19479 URL: https://issues.apache.org/jira/browse/HADOOP-19479 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.4.1 Reporter: Sarunas Valaskevicius Reproduced while testing system resilience and turning S3 network off (introduced a network partition to the list of IPs S3 uses) {code:java} Found one Java-level deadlock: ============================= "sdk-ScheduledExecutor-2-3": waiting to lock monitor 0x00007f5c880a8630 (object 0x0000000315523c78, a java.lang.Object), which is held by "sdk-ScheduledExecutor-2-4" "sdk-ScheduledExecutor-2-4": waiting to lock monitor 0x00007f5c7c016700 (object 0x0000000327800000, a org.apache.hadoop.fs.s3a.S3ABlockOutputStream), which is held by "io-compute-blocker-15" "io-compute-blocker-15": waiting to lock monitor 0x00007f5c642ae900 (object 0x00000003af0001d8, a java.lang.Object), which is held by "sdk-ScheduledExecutor-2-3" Java stack information for the threads listed above: =================================================== "sdk-ScheduledExecutor-2-3": at java.lang.Thread.interrupt(java.base@21/Thread.java:1717) - waiting to lock <0x0000000315523c78> (a java.lang.Object) at software.amazon.awssdk.core.internal.http.timers.SyncTimeoutTask.run(SyncTimeoutTask.java:60) - locked <0x00000003af0001d8> (a java.lang.Object) at java.util.concurrent.Executors$RunnableAdapter.call(java.base@21/Executors.java:572) at java.util.concurrent.FutureTask.run(java.base@21/FutureTask.java:317) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@21/ScheduledThreadPoolExecutor.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21/ThreadPoolExecutor.java:1144) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21/ThreadPoolExecutor.java:642) at java.lang.Thread.runWith(java.base@21/Thread.java:1596) at java.lang.Thread.run(java.base@21/Thread.java:1583) "sdk-ScheduledExecutor-2-4": at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.getActiveBlock(S3ABlockOutputStream.java:304) - waiting to lock <0x0000000327800000> (a org.apache.hadoop.fs.s3a.S3ABlockOutputStream) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:485) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:66) at java.nio.channels.Channels$WritableByteChannelImpl.implCloseChannel(java.base@21/Channels.java:404) at java.nio.channels.spi.AbstractInterruptibleChannel$1.interrupt(java.base@21/AbstractInterruptibleChannel.java:163) - locked <0x00000003af0002a0> (a java.lang.Object) at java.lang.Thread.interrupt(java.base@21/Thread.java:1722) - locked <0x0000000315523c78> (a java.lang.Object) at software.amazon.awssdk.core.internal.http.timers.SyncTimeoutTask.run(SyncTimeoutTask.java:60) - locked <0x00000003af0002e0> (a java.lang.Object) at java.util.concurrent.Executors$RunnableAdapter.call(java.base@21/Executors.java:572) at java.util.concurrent.FutureTask.run(java.base@21/FutureTask.java:317) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@21/ScheduledThreadPoolExecutor.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21/ThreadPoolExecutor.java:1144) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21/ThreadPoolExecutor.java:642) at java.lang.Thread.runWith(java.base@21/Thread.java:1596) at java.lang.Thread.run(java.base@21/Thread.java:1583) "io-compute-blocker-15": at software.amazon.awssdk.core.internal.http.timers.SyncTimeoutTask.cancel(SyncTimeoutTask.java:74) - waiting to lock <0x00000003af0001d8> (a java.lang.Object) at software.amazon.awssdk.core.internal.http.timers.ApiCallTimeoutTracker.cancel(ApiCallTimeoutTracker.java:53) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:77) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler$$Lambda/0x00007f5d2cb20ca8.get(Unknown Source) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53) at software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1463) at software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$createMultipartUpload$4(DelegatingS3Client.java:1232) at software.amazon.awssdk.services.s3.DelegatingS3Client$$Lambda/0x00007f5d2d316118.apply(Unknown Source) at software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:67) at software.amazon.awssdk.services.s3.DelegatingS3Client.createMultipartUpload(DelegatingS3Client.java:1232) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$initiateMultipartUpload$30(S3AFileSystem.java:4705) at org.apache.hadoop.fs.s3a.S3AFileSystem$$Lambda/0x00007f5d2d315ef8.get(Unknown Source) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfSupplier(IOStatisticsBinding.java:651) at org.apache.hadoop.fs.s3a.S3AFileSystem.initiateMultipartUpload(S3AFileSystem.java:4703) at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$initiateMultiPartUpload$0(WriteOperationHelper.java:283) at org.apache.hadoop.fs.s3a.WriteOperationHelper$$Lambda/0x00007f5d2d30e230.apply(Unknown Source) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122) at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) at org.apache.hadoop.fs.s3a.Invoker$$Lambda/0x00007f5d2d2dd6a0.apply(Unknown Source) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) at org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:207) at org.apache.hadoop.fs.s3a.WriteOperationHelper.initiateMultiPartUpload(WriteOperationHelper.java:278) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.lambda$new$0(S3ABlockOutputStream.java:904) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$$Lambda/0x00007f5d2d30e000.apply(Unknown Source) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda/0x00007f5d2ca3c918.apply(Unknown Source) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.<init>(S3ABlockOutputStream.java:902) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.initMultipartUpload(S3ABlockOutputStream.java:462) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.uploadCurrentBlock(S3ABlockOutputStream.java:439) - locked <0x0000000327800000> (a org.apache.hadoop.fs.s3a.S3ABlockOutputStream) at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.write(S3ABlockOutputStream.java:413) - locked <0x0000000327800000> (a org.apache.hadoop.fs.s3a.S3ABlockOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:62) at java.io.DataOutputStream.write(java.base@21/DataOutputStream.java:115) - locked <0x0000000327800208> (a org.apache.hadoop.fs.FSDataOutputStream) at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.write(HadoopPositionOutputStream.java:50) at java.nio.channels.Channels$WritableByteChannelImpl.write(java.base@21/Channels.java:392) - locked <0x00000003afab3da8> (a java.lang.Object) at org.apache.parquet.bytes.ConcatenatingByteBufferCollector.writeAllTo(ConcatenatingByteBufferCollector.java:77) at org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:1338) at org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:1259) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:408) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:675) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:210) at org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:178) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:154) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:428) Found 1 deadlock. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org