My understanding is automatic refresh is supposed to be more efficient about performing catalog updates as well. So it'd be worth experimenting with it.
On Wed, Jan 8, 2025 at 5:45 AM Saulius Valatka <saulius...@gmail.com> wrote: > Hi, > > sorry, maybe I worded my question wrong: I understand that refreshing is > needed (either automatic or manual), main concerns are the latency of the > refresh and the fact that the table is not queryable while it's being > refreshed - for large tables that are being updated frequently this > combination makes them essentially un-queryable. > > 2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė: > > > Hi, > > > > I don't think that the issue you describe is specific to Iceberg in a > sense > > that even for Hive tables if you make changes using an engine that > doesn't > > trigger HMS events, one has to issue refresh/invalidate metadata to see > the > > changes reflected in Impala. > > Could you share what catalog you use for your Iceberg tables? And what > tool > > do you use for data ingestion into these tables? > > If you use the HMS backed HiveCatalog as a catalog and an engine that > > triggers HMS notifications, like Spark or Hive then even for Iceberg > tables > > you can avoid executing refresh manually. > > > > Gabor > > > > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com> > > wrote: > > > > > Hi, > > > > > > If I understand correctly, once an Iceberg table is mutated outside of > > > Impala one has to run a refresh or invalidate statement. We noticed > that > > > running refresh on huge tables can take minutes and while that is > > happening > > > querying them is blocked. We have large event tables that are being > > updated > > > very frequently in real-time, by default we run a refresh after each > > > update, so effectively this means such tables are un-queryable, as > > they're > > > constantly being refreshed. > > > > > > Is there something I'm missing? What would the recommendation here be? > > > > > >