[GitHub] [iceberg] ubyyj commented on issue #3636: BaseDataReader is not properly closed during query by spark sql

GitBox Mon, 11 Apr 2022 23:50:48 -0700


ubyyj commented on issue #3636:
URL: https://github.com/apache/iceberg/issues/3636#issuecomment-1096187335


   here is the full call stack:
   `22/04/12 06:02:30 WARN hadoop.HadoopStreams: Unclosed input stream created 
by:
        
org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.<init>(HadoopStreams.java:81)
        org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:56)
        
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:175)
        
org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:101)
        org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
        
org.apache.iceberg.io.CloseableIterable$4$1.<init>(CloseableIterable.java:99)
        
org.apache.iceberg.io.CloseableIterable$4.iterator(CloseableIterable.java:98)
        
org.apache.iceberg.io.CloseableIterable$4.iterator(CloseableIterable.java:90)
        java.lang.Iterable.forEach(Iterable.java:74)
        
org.apache.iceberg.ManifestFilterManager.filterManifestWithDeletedFiles(ManifestFilterManager.java:385)
        
org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:308)
        
org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:186)
        org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
        org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
        org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
        java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        java.util.concurrent.FutureTask.run(FutureTask.java:266)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        java.lang.Thread.run(Thread.java:748)`
   
   `22/04/12 06:03:00 ERROR v2.OverwritePartitionsDynamicExec: Data source 
write support IcebergBatchWrite(table=spark_catalog.default.mytable, 
format=PARQUET) aborted.
   org.apache.spark.SparkException: Writing job aborted
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:613)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:386)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:330)
        at 
org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.writeWithV2(WriteToDataSourceV2Exec.scala:271)
        at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:309)
        at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:308)
        at 
org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.run(WriteToDataSourceV2Exec.scala:271)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to 
create file: 
s3a://mybucket/mytable/metadata/b1506cbf-2dd4-42b3-9dc4-81d798a0ca75-m14.avro
        at 
org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:87)
        at 
org.apache.iceberg.avro.AvroFileAppender.<init>(AvroFileAppender.java:51)
        at org.apache.iceberg.avro.Avro$WriteBuilder.build(Avro.java:198)
        at 
org.apache.iceberg.ManifestWriter$V2Writer.newAppender(ManifestWriter.java:211)
        at org.apache.iceberg.ManifestWriter.<init>(ManifestWriter.java:58)
        at org.apache.iceberg.ManifestWriter.<init>(ManifestWriter.java:34)
        at 
org.apache.iceberg.ManifestWriter$V2Writer.<init>(ManifestWriter.java:189)
        at org.apache.iceberg.ManifestFiles.write(ManifestFiles.java:119)
        at 
org.apache.iceberg.SnapshotProducer.newManifestWriter(SnapshotProducer.java:370)
        at 
org.apache.iceberg.MergingSnapshotProducer$DataFileFilterManager.newManifestWriter(MergingSnapshotProducer.java:711)
        at 
org.apache.iceberg.ManifestFilterManager.filterManifestWithDeletedFiles(ManifestFilterManager.java:383)
        at 
org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:308)
        at 
org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:186)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
        at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.InterruptedIOException: getFileStatus on 
s3a://mybucket/mytable/metadata/b1506cbf-2dd4-42b3-9dc4-81d798a0ca75-m14.avro: 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:125)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1571)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:555)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
        at 
org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)
        ... 20 more
   Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: 
Timeout waiting for connection from pool
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
        at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1321)
        at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1295)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:904)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1553)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ubyyj commented on issue #3636: BaseDataReader is not properly closed during query by spark sql

Reply via email to