Re: Iceberg table refresh/invalidate latency

Zoltán Borók-Nagy Thu, 09 Jan 2025 09:36:52 -0800

Opened the following tickets to track the items dropped out from this
thread:



   1. IMPALA-13651
<https://issues.apache.org/jira/browse/IMPALA-13651>: Iceberg
   tables load column stats twice
   2.
      1. IMPALA-13652
<https://issues.apache.org/jira/browse/IMPALA-13652>: Optimize
         plain SELECT LIMIT queries for Iceberg tables
      1. Cheers,
      2.     Zoltan


On Thu, Jan 9, 2025 at 2:13 PM Saulius Valatka <saulius...@gmail.com> wrote:

> sure, attached the logs
>
> 2025-01-09, kt, 14:02 Zoltán Borók-Nagy <borokna...@cloudera.com> rašė:
>
>> Thanks Saulius, this is very interesting. Trino seems to be much better
>> optimized for small queries on huge tables. This is definitely a room
>> for improvement for us.
>>
>> Where most time were being spent:
>>    - Metadata of all 2 tables cached: 42s528ms (42s528ms)
>>    - Single node plan created: 1m13s (30s493ms)
>>    - Completed admission: 1m55s (28s320ms)
>>
>> Since the query has a LIMIT 10 clause, we could probably short circuit the
>> first two (I guess there are no delete files), and then admission would be
>> also faster probably.
>>
>> Is it possible to filter the coordinator logs for the query id and send it
>> to us, if it doesn't contain any sensitive information (column names,
>> etc.)? That'll provide us with more hints about where the time was spent.
>>
>> Cheers,
>>     Zoltan
>>
>>
>> On Thu, Jan 9, 2025 at 12:47 PM Saulius Valatka <saulius...@gmail.com>
>> wrote:
>>
>> > Yeah, sorry, I was probably wrong about the "blocked" part: it's the
>> select
>> > statement without a filter that's taking very long, regardless of any
>> > running refreshes in parallel. Adding a partition filter for the last
>> day
>> > brings down the execution time to ~10 seconds (still way below Trino, it
>> > can do it in ~2 seconds).
>> >
>> > 2025-01-09, kt, 13:31 Saulius Valatka <saulius...@gmail.com> rašė:
>> >
>> > > The table being "blocked" is just my hypothesis, maybe I'm wrong? I
>> can
>> > > see that when a REFRESH statement is running, issuing a simple
>> "select *
>> > > from table limit 10" sits in "CREATED Query submitted" state until
>> > roughly
>> > > when the refresh statement finishes.
>> > > Just out of curiosity I launched a Trino cluster on the same servers
>> as
>> > > Impala and hooked it up to the same Iceberg catalog, issuing an
>> identical
>> > > "select * from table limit 10" runs in ~2 seconds, whereas in Impala
>> it's
>> > > ~2 minutes.
>> > >
>> > > Here's the details for the select query execution:
>> > >
>> > > Query Compilation: 1m14s
>> > >    - Metadata of all 2 tables cached: 42s528ms (42s528ms)
>> > >    - Analysis finished: 42s539ms (10.112ms)
>> > >    - Authorization finished (ranger): 42s540ms (1.246ms)
>> > >    - Value transfer graph computed: 42s540ms (609.596us)
>> > >    - Single node plan created: 1m13s (30s493ms)
>> > >    - Runtime filters computed: 1m13s (44.325us)
>> > >    - Distributed plan created: 1m13s (29.075us)
>> > >    - Parallel plans created: 1m13s (266.400us)
>> > >    - Planning finished: 1m14s (1s727ms)
>> > > Query Timeline: 2m13s
>> > >    - Query submitted: 50.627us (50.627us)
>> > >    - Planning finished: 1m27s (1m27s)
>> > >    - Submit for admission: 1m27s (273.383us)
>> > >    - Completed admission: 1m55s (28s320ms)
>> > >    - Ready to start on 121 backends: 1m55s (91.685ms)
>> > >    - All 121 execution backends (1921 fragment instances) started:
>> 2m7s
>> > > (11s578ms)
>> > >    - Rows available: 2m7s (1.412ms)
>> > >    - First row fetched: 2m7s (291.051ms)
>> > >    - Last row fetched: 2m7s (225.054ms)
>> > >    - Released admission control resources: 2m13s (5s433ms)
>> > >    - Unregister query: 2m13s (100.938ms)
>> > >
>> > > Catalog Server Operation: 49s120ms
>> > >        - Got Metastore client: 3.116us (3.116us)
>> > >        - Got catalog version read lock: 2.250ms (2.247ms)
>> > >        - Got catalog version write lock and table write lock: 2.315ms
>> > > (65.074us)
>> > >        - Got Metastore client: 2.320ms (4.258us)
>> > >        - Fetched table from Metastore: 12.885ms (10.565ms)
>> > >        - Loaded Iceberg API table: 72.570ms (59.684ms)
>> > >        - Loaded schema from Iceberg: 72.751ms (181.127us)
>> > >        - Loaded Iceberg files: 7s925ms (7s852ms)
>> > >        - Loaded all column stats: 7s953ms (28.404ms)
>> > >        - Loaded table schema: 7s958ms (5.226ms)
>> > >        - Start refreshing file metadata: 7s959ms (431.115us)
>> > >        - Loaded file metadata for 1 partitions: 22s081ms (14s122ms)
>> > >        - Loaded all column stats: 49s120ms (27s038ms)
>> > >        - Loaded table: 49s120ms (30.027us)
>> > >        - Finished resetMetadata request: 49s120ms (312.849us)
>> > > Query Compilation: 53s727ms
>> > >        - Metadata of all 2 tables cached: 53s726ms (53s726ms)
>> > >        - Analysis finished: 53s726ms (67.048us)
>> > >        - Authorization finished (ranger): 53s727ms (422.569us)
>> > >        - Planning finished: 53s727ms (137.463us)
>> > > Query Timeline: 1m44s
>> > >        - Query submitted: 42.071us (42.071us)
>> > >        - Planning finished: 53s735ms (53s735ms)
>> > >        - CatalogDdlRequest finished: 1m42s (49s122ms)
>> > >        - Applied catalog updates from DDL: 1m43s (1s049ms)
>> > >        - Request finished: 1m43s (14.695ms)
>> > >        - Unregister query: 1m44s (912.626ms)
>> > >
>> > > 2025-01-09, kt, 12:33 Zoltán Borók-Nagy <borokna...@cloudera.com>
>> rašė:
>> > >
>> > >> Thanks for the update.
>> > >>
>> > >> So the whole REFRESH operation took 1m50s. From this
>> CatalogDdlRequest
>> > was
>> > >> only 46s313ms. This 46s313ms is the time when the table is blocked,
>> > right?
>> > >> From CatalogDdlRequest the longest operation was loading column stats
>> > >> which
>> > >> took 27s300ms. This is a single RPC (getTableColumnStatistics())
>> toward
>> > >> HMS, it would be good to know why it took so long. Especially given
>> that
>> > >> loading file metadata for this huge table was around 19 seconds.
>> > >> It's also interesting that "Loaded all column stats" appears twice in
>> > the
>> > >> catalog timeline. At first it took 11.697ms, and the second
>> invocation
>> > was
>> > >> the one that took 27s300ms. Hopefully we can get rid of the second
>> > >> invocation but that'll require a code change.
>> > >>
>> > >> I also wonder why the table is not queryable from the Coordinator
>> cache
>> > >> while it is being reloaded in CatalogD, I hope we can fix this as
>> well.
>> > >>
>> > >> Is the table expected to grow indefinitely? Or do you drop/relocate
>> old
>> > >> partitions after some time?
>> > >>
>> > >> Cheers,
>> > >>     Zoltan
>> > >>
>> > >>
>> > >> On Wed, Jan 8, 2025 at 9:19 PM Saulius Valatka <saulius...@gmail.com
>> >
>> > >> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > so I just tried applying IMPALA-13254 on top of 4.4.1, redeployed
>> and
>> > >> > refresh times for the largest table went down from ~80 minutes to
>> ~2
>> > >> > minutes!
>> > >> > That's waaay better, but still not ideal: if we issue a refresh
>> every
>> > 15
>> > >> > minutes, there's still a lot of time the table is blocked for a
>> minute
>> > >> or
>> > >> > two, but at least now it's queryable.
>> > >> >
>> > >> > Here's an example REFRESH timeline:
>> > >> >
>> > >> > Catalog Server Operation: 46s213ms
>> > >> >    - Got Metastore client: 5.771us (5.771us)
>> > >> >    - Got catalog version read lock: 2.334ms (2.328ms)
>> > >> >    - Got catalog version write lock and table write lock: 2.459ms
>> > >> > (125.470us)
>> > >> >    - Got Metastore client: 2.465ms (6.181us)
>> > >> >    - Fetched table from Metastore: 12.831ms (10.366ms)
>> > >> >    - Loaded Iceberg API table: 139.024ms (126.192ms)
>> > >> >    - Loaded schema from Iceberg: 139.175ms (150.949us)
>> > >> >    - Loaded Iceberg files: 5s036ms (4s897ms)
>> > >> >    - Loaded all column stats: 5s047ms (11.697ms)
>> > >> >    - Loaded table schema: 5s053ms (5.279ms)
>> > >> >    - Start refreshing file metadata: 5s053ms (291.959us)
>> > >> >    - Loaded file metadata for 1 partitions: 18s912ms (13s859ms)
>> > >> >    - Loaded all column stats: 46s213ms (27s300ms)
>> > >> >    - Loaded table: 46s213ms (36.600us)
>> > >> >    - Finished resetMetadata request: 46s213ms (485.329us)
>> > >> > Query Compilation: 1m3s
>> > >> >    - Metadata of all 2 tables cached: 1m3s (1m3s)
>> > >> >    - Analysis finished: 1m3s (237.705us)
>> > >> >    - Authorization finished (ranger): 1m3s (808.267us)
>> > >> >    - Planning finished: 1m3s (581.713us)
>> > >> > Query Timeline: 1m51s
>> > >> >    - Query submitted: 40.227us (40.227us)
>> > >> >    - Planning finished: 1m3s (1m3s)
>> > >> >    - CatalogDdlRequest finished: 1m50s (46s313ms)
>> > >> >    - Applied catalog updates from DDL: 1m50s (24.206ms)
>> > >> >    - Request finished: 1m50s (202.949us)
>> > >> >    - Unregister query: 1m51s (763.412ms)
>> > >> >
>> > >> >
>> > >> > 2025-01-08, tr, 17:50 Zoltán Borók-Nagy <borokna...@cloudera.com>
>> > rašė:
>> > >> >
>> > >> > > Thanks for the info, Saulius.
>> > >> > >
>> > >> > > If you try out IMPALA-13254, please let us know how much it
>> helps in
>> > >> > > your case.
>> > >> > > Hopefully it speeds up table loading times enough so it won't
>> cause
>> > >> too
>> > >> > > much turbulence.
>> > >> > > Some table loading statistics would be also helpful to know where
>> > the
>> > >> > time
>> > >> > > is being spent.
>> > >> > >
>> > >> > > Do you use local catalog mode?
>> > >> > >
>> > https://impala.apache.org/docs/build/html/topics/impala_metadata.html
>> > >> > > I'm not sure how much it will help, but it could be worth trying
>> > out.
>> > >> > >
>> > >> > > Cheers,
>> > >> > >     Zoltan
>> > >> > >
>> > >> > >
>> > >> > > On Wed, Jan 8, 2025 at 2:45 PM Saulius Valatka <
>> > saulius...@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > sorry, maybe I worded my question wrong: I understand that
>> > >> refreshing
>> > >> > is
>> > >> > > > needed (either automatic or manual), main concerns are the
>> latency
>> > >> of
>> > >> > the
>> > >> > > > refresh and the fact that the table is not queryable while it's
>> > >> being
>> > >> > > > refreshed - for large tables that are being updated frequently
>> > this
>> > >> > > > combination makes them essentially un-queryable.
>> > >> > > >
>> > >> > > > 2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org>
>> rašė:
>> > >> > > >
>> > >> > > > > Hi,
>> > >> > > > >
>> > >> > > > > I don't think that the issue you describe is specific to
>> Iceberg
>> > >> in a
>> > >> > > > sense
>> > >> > > > > that even for Hive tables if you make changes using an engine
>> > that
>> > >> > > > doesn't
>> > >> > > > > trigger HMS events, one has to issue refresh/invalidate
>> metadata
>> > >> to
>> > >> > see
>> > >> > > > the
>> > >> > > > > changes reflected in Impala.
>> > >> > > > > Could you share what catalog you use for your Iceberg tables?
>> > And
>> > >> > what
>> > >> > > > tool
>> > >> > > > > do you use for data ingestion into these tables?
>> > >> > > > > If you use the HMS backed HiveCatalog as a catalog and an
>> engine
>> > >> that
>> > >> > > > > triggers HMS notifications, like Spark or Hive then even for
>> > >> Iceberg
>> > >> > > > tables
>> > >> > > > > you can avoid executing refresh manually.
>> > >> > > > >
>> > >> > > > > Gabor
>> > >> > > > >
>> > >> > > > > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka <
>> > >> saulius...@gmail.com
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > >
>> > >> > > > > > Hi,
>> > >> > > > > >
>> > >> > > > > > If I understand correctly, once an Iceberg table is mutated
>> > >> outside
>> > >> > > of
>> > >> > > > > > Impala one has to run a refresh or invalidate statement. We
>> > >> noticed
>> > >> > > > that
>> > >> > > > > > running refresh on huge tables can take minutes and while
>> that
>> > >> is
>> > >> > > > > happening
>> > >> > > > > > querying them is blocked. We have large event tables that
>> are
>> > >> being
>> > >> > > > > updated
>> > >> > > > > > very frequently in real-time, by default we run a refresh
>> > after
>> > >> > each
>> > >> > > > > > update, so effectively this means such tables are
>> > un-queryable,
>> > >> as
>> > >> > > > > they're
>> > >> > > > > > constantly being refreshed.
>> > >> > > > > >
>> > >> > > > > > Is there something I'm missing? What would the
>> recommendation
>> > >> here
>> > >> > > be?
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: Iceberg table refresh/invalidate latency

Reply via email to