codope commented on issue #12477:
URL: https://github.com/apache/hudi/issues/12477#issuecomment-2547586697

   > My assumption is that in Spark SQL we are unable to set 
`hoodie.file.index.enable` as false and thus the error of 
`FileNotFoundException` ocuurs.
   
   You could set it as a spark session config.
   
   > Scenario is a read SQL is setup and while the read operation is underway 
an independent write operation to the same table is done which causes a failure 
on the initial read operation initiated through Spark SQL.
   
   Generally speaking, Hudi guarantees snapshot isolation between writers and 
readers through its timeline and multi-version concurrency control. And Hudi 
does not delete the last version of any data file unless the cleaner is 
configured that way (your configs suggest no change to the default cleaner 
configs). I would like to understand more about your use case and also how the 
file is getting deleted? Are you using OSS Hudi or EMR Hudi? If it's the 
latter, did you also try with the 0.15.0 version of OSS Hudi? Could you zip the 
`.hoodie` folder under the base path of the erroneous table and share it with 
us?
   
   We have many production use cases with concurrent read and write scenario, 
and data freshness latency of just a few minutes. For example - 
https://aws.amazon.com/blogs/big-data/how-nerdwallet-uses-aws-and-apache-hudi-to-build-a-serverless-real-time-analytics-platform/
 
   
   If it's just single writer and multiple readers, Hudi employs MVCC by 
default. I will need to review the script shared above to understand further 
what's going on.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to