huangxiaopingRD opened a new pull request, #8352:
URL: https://github.com/apache/hudi/pull/8352
### Change Logs
Spark will cache some meta information of the table. After the
RollbackToInstantTimeProcedure is executed on the table, the meta information
will change and the table needs to be refreshed. Otherwise, the following error
will occur when querying the data again:
```
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://xxxxx/user/hive/warehouse/hudi_cow_nonpcf_tbl2/7a19abfb-35ab-40bb-9580-6b1af681506a-0_0-23-20_20230402002001284.parquet
It is possible the underlying files have been updated. You can explicitly
invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in
SQL or by recreating the Dataset/DataFrame involved.
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:503)
```
### Impact
No
### Risk level (write none, low medium or high below)
none
### Documentation Update
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]