Hi, sorry, maybe I worded my question wrong: I understand that refreshing is needed (either automatic or manual), main concerns are the latency of the refresh and the fact that the table is not queryable while it's being refreshed - for large tables that are being updated frequently this combination makes them essentially un-queryable.
2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė: > Hi, > > I don't think that the issue you describe is specific to Iceberg in a sense > that even for Hive tables if you make changes using an engine that doesn't > trigger HMS events, one has to issue refresh/invalidate metadata to see the > changes reflected in Impala. > Could you share what catalog you use for your Iceberg tables? And what tool > do you use for data ingestion into these tables? > If you use the HMS backed HiveCatalog as a catalog and an engine that > triggers HMS notifications, like Spark or Hive then even for Iceberg tables > you can avoid executing refresh manually. > > Gabor > > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com> > wrote: > > > Hi, > > > > If I understand correctly, once an Iceberg table is mutated outside of > > Impala one has to run a refresh or invalidate statement. We noticed that > > running refresh on huge tables can take minutes and while that is > happening > > querying them is blocked. We have large event tables that are being > updated > > very frequently in real-time, by default we run a refresh after each > > update, so effectively this means such tables are un-queryable, as > they're > > constantly being refreshed. > > > > Is there something I'm missing? What would the recommendation here be? > > >