zhangdove opened a new issue #1702:
URL: https://github.com/apache/iceberg/issues/1702


   ### Use Case:
   
   We have a Spark service on Zeppelin for others to look up Iceberg data 
online.
   
   1. At time T1, query `prod.db.tb`.(Record version number)`(select * from 
prod.db.tb limit 1 )`
   2. At time T2, the second query is made on `prod.db.tb`. 
   
   The time between T1 and T2 may be a day or a month or more. During this 
time, we have an asynchronous operation to clear the small files of 
`prod.db.tb` table, including version file.
   
   It has been some time since the last query, do one more metadata update on 
the current table (In fact,the cache of the MetaTable has already been 
invalidated).
   
   ```scala
   spark.sql("refresh table prod.db.tb")
   ```
   
   ### Phenomenon:
   ```bash
   spark.sql("refresh table prod.db.tb")
   org.apache.iceberg.exceptions.ValidationException: Metadata file for version 
3175 is missing
     at 
org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:100)
     at org.apache.iceberg.BaseTable.refresh(BaseTable.java:49)
     at 
org.apache.iceberg.spark.SparkCatalog.invalidateTable(SparkCatalog.java:255)
     at 
org.apache.spark.sql.execution.datasources.v2.RefreshTableExec.run(RefreshTableExec.scala:28)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
     at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to