parisni opened a new issue, #7846:
URL: https://github.com/apache/hudi/issues/7846

   hudi 0.12.2
   spark 3.2.1
   -----------------
   
   Once an incremental read is made, all subsequent read on the table will 
remain the same. Likely something about incremental is cached
   
   ```scala
   // query 1
   spark.read.format("hudi")
   .option("hoodie.metadata.enable","true")
   .table("database.hudi_table").count()
   // 1000
   
   // query 2
   spark.read.format("hudi")
   .option("hoodie.metadata.enable","true")
   .option("hoodie.datasource.query.type","incremental")
   .option("hoodie.datasource.read.begin.instanttime","20230203191804078")
   .table("database.hudi_table").count()
   // 200
   
   // query 3
   spark.read.format("hudi")
   .option("hoodie.metadata.enable","true")
   .table("database.hudi_table").count()
   // 200 should be 1000
   ```
   
   Also weird, the number of tasks is way smaller if query 2 is run in a fresh 
spark session.
   - query 1 before query 2, then 25k tasks
   - query 2 without query 1, then 17 tasks
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to