My understanding is automatic refresh is supposed to be more efficient
about performing catalog updates as well. So it'd be worth experimenting
with it.

On Wed, Jan 8, 2025 at 5:45 AM Saulius Valatka <saulius...@gmail.com> wrote:

> Hi,
>
> sorry, maybe I worded my question wrong: I understand that refreshing is
> needed (either automatic or manual), main concerns are the latency of the
> refresh and the fact that the table is not queryable while it's being
> refreshed - for large tables that are being updated frequently this
> combination makes them essentially un-queryable.
>
> 2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė:
>
> > Hi,
> >
> > I don't think that the issue you describe is specific to Iceberg in a
> sense
> > that even for Hive tables if you make changes using an engine that
> doesn't
> > trigger HMS events, one has to issue refresh/invalidate metadata to see
> the
> > changes reflected in Impala.
> > Could you share what catalog you use for your Iceberg tables? And what
> tool
> > do you use for data ingestion into these tables?
> > If you use the HMS backed HiveCatalog as a catalog and an engine that
> > triggers HMS notifications, like Spark or Hive then even for Iceberg
> tables
> > you can avoid executing refresh manually.
> >
> > Gabor
> >
> > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > If I understand correctly, once an Iceberg table is mutated outside of
> > > Impala one has to run a refresh or invalidate statement. We noticed
> that
> > > running refresh on huge tables can take minutes and while that is
> > happening
> > > querying them is blocked. We have large event tables that are being
> > updated
> > > very frequently in real-time, by default we run a refresh after each
> > > update, so effectively this means such tables are un-queryable, as
> > they're
> > > constantly being refreshed.
> > >
> > > Is there something I'm missing? What would the recommendation here be?
> > >
> >
>

Reply via email to