jotarada opened a new issue #4194:
URL: https://github.com/apache/iceberg/issues/4194
After run the deleteOrphanFiles task on a table it dropped several files.
After that we could't use the table anymore:
``` ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics
message: User class threw exception: org.apache.spark.SparkException: Writing
job aborted.
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:388)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.writeWithV2(ReplaceDataExec.scala:26)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.run(ReplaceDataExec.scala:34)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
at
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at
com.jorge.data.writer.WriterClass.$anonfun$write$6(WriterClass.scala:179)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at
scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:982)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at
scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:979)
at
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
at
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: org.apache.iceberg.exceptions.NotFoundException: Failed to open
input stream for file:
gs://bucket-name/iceberg/schema/table/metadata/37725537-5a23-40a1-a59c-0a365e68c202-m0.avro
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:177)
at
org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:101)
at
org.apache.iceberg.avro.AvroIterable.getMetadata(AvroIterable.java:66)
at org.apache.iceberg.ManifestReader.<init>(ManifestReader.java:103)
at org.apache.iceberg.ManifestFiles.read(ManifestFiles.java:87)
at
org.apache.iceberg.SnapshotProducer.newManifestReader(SnapshotProducer.java:378)
at
org.apache.iceberg.MergingSnapshotProducer$DataFileMergeManager.newManifestReader(MergingSnapshotProducer.java:682)
at
org.apache.iceberg.ManifestMergeManager.createManifest(ManifestMergeManager.java:158)
at
org.apache.iceberg.ManifestMergeManager.lambda$mergeGroup$1(ManifestMergeManager.java:139)
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:71)
at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:311)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Item not found:
'gs://bucket-name/iceberg/schema/table/metadata/37725537-5a23-40a1-a59c-0a365e68c202-m0.avro'.
Note, it is possible that the live version is still available but the
requested generation is deleted.```
And i can see from the deleteOrphanFiles task logs that is one of the files
that were deleted.
Some how the table metadata is inconsistent withe the real files. Is there a
way to fix it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]