Hey Saulius,

Thanks for reaching out.

once an Iceberg table is mutated outside of Impala one has to run a refresh
> or invalidate statement

If the Iceberg table lives in the HiveCatalog, and Automatic
Invalidation/Refresh of Metadata is enabled, then you don't need to, i.e.
Impala will eventually pick up the new table state.
See https://impala.apache.org/docs/build/html/topics/impala_metadata.html

We noticed that running refresh on huge tables can take minutes and while
> that is happening querying them is blocked
>
What version do you use? There were quite a few improvements in that area
lately.
Though a major improvement is coming in 4.5.0: IMPALA-13254
<https://issues.apache.org/jira/browse/IMPALA-13254>

We have large event tables that are being updated very frequently in
> real-time
>
I'm a bit curious about what "very frequent" means here. Is it possible for
you to share some numbers?

What would the recommendation here be?
>
Until Impala 4.5 you can try reducing the frequency of table updates. Also
the number of files play a huge role in table loading times. Maybe you can
try compacting the table from time to time.

Cheers,
    Zoltan


On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <saulius...@gmail.com> wrote:

> Hi,
>
> If I understand correctly, once an Iceberg table is mutated outside of
> Impala one has to run a refresh or invalidate statement. We noticed that
> running refresh on huge tables can take minutes and while that is happening
> querying them is blocked. We have large event tables that are being updated
> very frequently in real-time, by default we run a refresh after each
> update, so effectively this means such tables are un-queryable, as they're
> constantly being refreshed.
>
> Is there something I'm missing? What would the recommendation here be?
>

Reply via email to