Vitali Makarevich created HUDI-7034:
---------------------------------------
Summary: Refresh view do not work(due to cache)
Key: HUDI-7034
URL: https://issues.apache.org/jira/browse/HUDI-7034
Project: Apache Hudi
Issue Type: Bug
Reporter: Vitali Makarevich
Starting from 0.13.1 `spark.catalog.refreshTable` works incorrectly. In 0.12.3
it works ok.
Reproduction is
[here|https://github.com/VitoMakarevich/hudi-incremental-issue/blob/master/src/main/scala/com/example/hudi/HudiRefreshBug.scala].
What is happening - there is a `BaseHoodieTableFileIndex` class in Hudi - it's
saved in spark plan once the table is created. When I call to refresh, the
respective method `doRefresh` is called. This method reloads the metadata view,
and list of partitions, but now it does not refresh the list of files in
partitions - this causes a bug that partitions are stuck at the first file
version. So - updates are not picked up and after a couple of commits based on
cleaner settings - Spark starts to throw a file not found exception.
More precisely - it looks to be broken in [this commitÂ
|https://github.com/apache/hudi/commit/34b226c0cba7ff022eb8c02246f46c5f9cbe7ec5]
I can try to provide a fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)