zhangdove opened a new issue #1702:
URL: https://github.com/apache/iceberg/issues/1702
### Use Case:
We have a Spark service on Zeppelin for others to look up Iceberg data
online.
1. At time T1, query `prod.db.tb`.(Record version number)`(select * from
prod.db.tb limit 1 )`
2. At time T2, the second query is made on `prod.db.tb`.
The time between T1 and T2 may be a day or a month or more. During this
time, we have an asynchronous operation to clear the small files of
`prod.db.tb` table, including version file.
It has been some time since the last query, do one more metadata update on
the current table (In fact,the cache of the MetaTable has already been
invalidated).
```scala
spark.sql("refresh table prod.db.tb")
```
### Phenomenon:
```bash
spark.sql("refresh table prod.db.tb")
org.apache.iceberg.exceptions.ValidationException: Metadata file for version
3175 is missing
at
org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:100)
at org.apache.iceberg.BaseTable.refresh(BaseTable.java:49)
at
org.apache.iceberg.spark.SparkCatalog.invalidateTable(SparkCatalog.java:255)
at
org.apache.spark.sql.execution.datasources.v2.RefreshTableExec.run(RefreshTableExec.scala:28)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]