jfrylings-twilio commented on issue #8325:
URL: https://github.com/apache/hudi/issues/8325#issuecomment-1535478999

   I'm seeing the same issue with hudi 0.8 and Spark 3.2.1 but not using Flink 
or metadata table.
   (I know this combination of Hudi/Spark versions is not officially supported)
   
   This job continues to fail over and over with the same error even when 
manually restarting it.  It has been working for months with no issue and no 
recent code changes.  Other jobs using the same code but different input data 
source and different data output continue to work fine.  Those other jobs that 
continue to work are running in the same cluster and are not having any 
connectivity issues to s3.
   
   The file it is erroring on does not exist.  There is nothing in that 
directory.  I haven't redacted the file name it really is filled with mostly 0s
   
`s3a://<bucket>/<path>/.hoodie/.aux/.bootstrap/.partitions/00000000-0000-0000-0000-000000000000-0_1-0-1_00000000000001.hfile`
   
   ```
   23/05/04 21:50:31 WARN TaskSetManager: Lost task 120.0 in stage 26.0 (TID 
10224) (10.221.232.192 executor 1): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :120
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
       at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
       at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
       at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
       at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:131)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
       at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to close 
UpdateHandle
       at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:359)
       at 
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:107)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
       ... 28 more
   Caused by: java.io.InterruptedIOException: Writing Object on 
<s3_path>/<a_new_file_that_doesn't_exist>.parquet: 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:389)
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:196)
       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
       at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:320)
       at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
       at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:316)
       at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:291)
       at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:168)
       at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.putObject(WriteOperationHelper.java:515)
       at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.lambda$putObject$0(S3ABlockOutputStream.java:548)
       at 
org.apache.hadoop.thirdparty.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
       at 
org.apache.hadoop.thirdparty.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
       at 
org.apache.hadoop.thirdparty.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
       at 
org.apache.hadoop.util.SemaphoredDelegatingExecutor$RunnableWithPermitRelease.run(SemaphoredDelegatingExecutor.java:196)
       at 
org.apache.hadoop.util.SemaphoredDelegatingExecutor$RunnableWithPermitRelease.run(SemaphoredDelegatingExecutor.java:196)
       ... 3 more
   Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: 
Timeout waiting for connection from pool
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
       at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
       at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
       at 
com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:415)
       at 
com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6289)
       at 
com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1834)
       at 
com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1794)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:2432)
       at 
org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$putObject$6(WriteOperationHelper.java:517)
       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:115)
       ... 15 more
   Caused by: 
com.amazonaws.thirdparty.apache.http.conn.ConnectionPoolTimeoutException: 
Timeout waiting for connection from pool
       at 
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316)
       at 
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
       at jdk.internal.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
       at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
       at java.base/java.lang.reflect.Method.invoke(Unknown Source)
       at 
com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
       at com.amazonaws.http.conn.$Proxy20.get(Unknown Source)
       at 
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
       at 
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
       at 
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
       ... 31 more
   
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
       at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
       at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
       at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
       at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:131)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
       at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class 
       at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
       at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:98)
       at 
org.apache.hudi.common.bootstrap.index.BootstrapIndex.getBootstrapIndex(BootstrapIndex.java:159)
       at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:107)
       at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
       at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:100)
       at 
org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:167)
       at 
org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$8894a6ab$1(FileSystemViewManager.java:255)
       at 
org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:110)
       at 
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
       at 
org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:109)
       at 
org.apache.hudi.table.HoodieTable.getBaseFileOnlyView(HoodieTable.java:264)
       at 
org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:111)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:335)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:307)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
       ... 28 more
   Caused by: java.lang.reflect.InvocationTargetException
       at 
jdk.internal.reflect.GeneratedConstructorAccessor87.newInstance(Unknown Source)
       at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
 Source)
       at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
       at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:87)
       ... 43 more
   Caused by: org.apache.hudi.exception.HoodieIOException: getFileStatus on 
s3a://<bucket>/<path>/.hoodie/.aux/.bootstrap/.partitions/00000000-0000-0000-0000-000000000000-0_1-0-1_00000000000001.hfile:
 com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool
       at 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.<init>(HFileBootstrapIndex.java:104)
       ... 47 more
   Caused by: java.io.InterruptedIOException: getFileStatus on 
s3a://<bucket>/<path>/.hoodie/.aux/.bootstrap/.partitions/00000000-0000-0000-0000-000000000000-0_1-0-1_00000000000001.hfile:
 com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:389)
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:196)
       at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3289)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
       at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:549)
       at 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.<init>(HFileBootstrapIndex.java:102)
       ... 47 more
   Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: 
Timeout waiting for connection from pool
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
       at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
       at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
       at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
       at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1360)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2066)
       at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
       at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2056)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2032)
       at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3273)
       ... 53 more
   Caused by: 
com.amazonaws.thirdparty.apache.http.conn.ConnectionPoolTimeoutException: 
Timeout waiting for connection from pool
       at 
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316)
       at 
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
       at jdk.internal.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
       at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
       at java.base/java.lang.reflect.Method.invoke(Unknown Source)
       at 
com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
       at com.amazonaws.http.conn.$Proxy20.get(Unknown Source)
       at 
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
       at 
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
       at 
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
       at 
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333)
       at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
       ... 69 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to