Re: Iceberg table refresh/invalidate latency

Saulius Valatka Wed, 08 Jan 2025 05:45:44 -0800

Hi,

sorry, maybe I worded my question wrong: I understand that refreshing is
needed (either automatic or manual), main concerns are the latency of the
refresh and the fact that the table is not queryable while it's being
refreshed - for large tables that are being updated frequently this
combination makes them essentially un-queryable.


2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė:

> Hi,
>
> I don't think that the issue you describe is specific to Iceberg in a sense
> that even for Hive tables if you make changes using an engine that doesn't
> trigger HMS events, one has to issue refresh/invalidate metadata to see the
> changes reflected in Impala.
> Could you share what catalog you use for your Iceberg tables? And what tool
> do you use for data ingestion into these tables?
> If you use the HMS backed HiveCatalog as a catalog and an engine that
> triggers HMS notifications, like Spark or Hive then even for Iceberg tables
> you can avoid executing refresh manually.
>
> Gabor
>
> On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com>
> wrote:
>
> > Hi,
> >
> > If I understand correctly, once an Iceberg table is mutated outside of
> > Impala one has to run a refresh or invalidate statement. We noticed that
> > running refresh on huge tables can take minutes and while that is
> happening
> > querying them is blocked. We have large event tables that are being
> updated
> > very frequently in real-time, by default we run a refresh after each
> > update, so effectively this means such tables are un-queryable, as
> they're
> > constantly being refreshed.
> >
> > Is there something I'm missing? What would the recommendation here be?
> >
>

Re: Iceberg table refresh/invalidate latency

Reply via email to