parisni opened a new issue, #7846:
URL: https://github.com/apache/hudi/issues/7846
hudi 0.12.2
spark 3.2.1
-----------------
Once an incremental read is made, all subsequent read on the table will
remain the same. Likely something about incremental is cached
```scala
// query 1
spark.read.format("hudi")
.option("hoodie.metadata.enable","true")
.table("database.hudi_table").count()
// 1000
// query 2
spark.read.format("hudi")
.option("hoodie.metadata.enable","true")
.option("hoodie.datasource.query.type","incremental")
.option("hoodie.datasource.read.begin.instanttime","20230203191804078")
.table("database.hudi_table").count()
// 200
// query 3
spark.read.format("hudi")
.option("hoodie.metadata.enable","true")
.table("database.hudi_table").count()
// 200 should be 1000
```
Also weird, the number of tasks is way smaller if query 2 is run in a fresh
spark session.
- query 1 before query 2, then 25k tasks
- query 2 without query 1, then 17 tasks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]