Re: Iceberg table refresh/invalidate latency

Michael Smith Wed, 08 Jan 2025 10:51:07 -0800

My understanding is automatic refresh is supposed to be more efficient
about performing catalog updates as well. So it'd be worth experimenting
with it.


On Wed, Jan 8, 2025 at 5:45 AM Saulius Valatka <saulius...@gmail.com> wrote:

> Hi,
>
> sorry, maybe I worded my question wrong: I understand that refreshing is
> needed (either automatic or manual), main concerns are the latency of the
> refresh and the fact that the table is not queryable while it's being
> refreshed - for large tables that are being updated frequently this
> combination makes them essentially un-queryable.
>
> 2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė:
>
> > Hi,
> >
> > I don't think that the issue you describe is specific to Iceberg in a
> sense
> > that even for Hive tables if you make changes using an engine that
> doesn't
> > trigger HMS events, one has to issue refresh/invalidate metadata to see
> the
> > changes reflected in Impala.
> > Could you share what catalog you use for your Iceberg tables? And what
> tool
> > do you use for data ingestion into these tables?
> > If you use the HMS backed HiveCatalog as a catalog and an engine that
> > triggers HMS notifications, like Spark or Hive then even for Iceberg
> tables
> > you can avoid executing refresh manually.
> >
> > Gabor
> >
> > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > If I understand correctly, once an Iceberg table is mutated outside of
> > > Impala one has to run a refresh or invalidate statement. We noticed
> that
> > > running refresh on huge tables can take minutes and while that is
> > happening
> > > querying them is blocked. We have large event tables that are being
> > updated
> > > very frequently in real-time, by default we run a refresh after each
> > > update, so effectively this means such tables are un-queryable, as
> > they're
> > > constantly being refreshed.
> > >
> > > Is there something I'm missing? What would the recommendation here be?
> > >
> >
>

Re: Iceberg table refresh/invalidate latency

Reply via email to