Hey Saulius, Thanks for reaching out.
once an Iceberg table is mutated outside of Impala one has to run a refresh > or invalidate statement If the Iceberg table lives in the HiveCatalog, and Automatic Invalidation/Refresh of Metadata is enabled, then you don't need to, i.e. Impala will eventually pick up the new table state. See https://impala.apache.org/docs/build/html/topics/impala_metadata.html We noticed that running refresh on huge tables can take minutes and while > that is happening querying them is blocked > What version do you use? There were quite a few improvements in that area lately. Though a major improvement is coming in 4.5.0: IMPALA-13254 <https://issues.apache.org/jira/browse/IMPALA-13254> We have large event tables that are being updated very frequently in > real-time > I'm a bit curious about what "very frequent" means here. Is it possible for you to share some numbers? What would the recommendation here be? > Until Impala 4.5 you can try reducing the frequency of table updates. Also the number of files play a huge role in table loading times. Maybe you can try compacting the table from time to time. Cheers, Zoltan On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com> wrote: > Hi, > > If I understand correctly, once an Iceberg table is mutated outside of > Impala one has to run a refresh or invalidate statement. We noticed that > running refresh on huge tables can take minutes and while that is happening > querying them is blocked. We have large event tables that are being updated > very frequently in real-time, by default we run a refresh after each > update, so effectively this means such tables are un-queryable, as they're > constantly being refreshed. > > Is there something I'm missing? What would the recommendation here be? >