Re: [I] [SUPPORT] FileNotFoundException while querying HUDI table via native Spark SQL with HMS as catalog [hudi]

via GitHub Mon, 16 Dec 2024 22:12:58 -0800


codope commented on issue #12477:
URL: https://github.com/apache/hudi/issues/12477#issuecomment-2547586697

> My assumption is that in Spark SQL we are unable to set
`hoodie.file.index.enable` as false and thus the error of
`FileNotFoundException` ocuurs.

You could set it as a spark session config.

> Scenario is a read SQL is setup and while the read operation is underway
an independent write operation to the same table is done which causes a failure
on the initial read operation initiated through Spark SQL.

Generally speaking, Hudi guarantees snapshot isolation between writers and
readers through its timeline and multi-version concurrency control. And Hudi
does not delete the last version of any data file unless the cleaner is
configured that way (your configs suggest no change to the default cleaner
configs). I would like to understand more about your use case and also how the
file is getting deleted? Are you using OSS Hudi or EMR Hudi? If it's the
latter, did you also try with the 0.15.0 version of OSS Hudi? Could you zip the
`.hoodie` folder under the base path of the erroneous table and share it with
us?

We have many production use cases with concurrent read and write scenario,
and data freshness latency of just a few minutes. For example -
https://aws.amazon.com/blogs/big-data/how-nerdwallet-uses-aws-and-apache-hudi-to-build-a-serverless-real-time-analytics-platform/

If it's just single writer and multiple readers, Hudi employs MVCC by
default. I will need to review the script shared above to understand further
what's going on.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] FileNotFoundException while querying HUDI table via native Spark SQL with HMS as catalog [hudi]

Reply via email to