Thanks Saulius, this is very interesting. Trino seems to be much better optimized for small queries on huge tables. This is definitely a room for improvement for us.
Where most time were being spent: - Metadata of all 2 tables cached: 42s528ms (42s528ms) - Single node plan created: 1m13s (30s493ms) - Completed admission: 1m55s (28s320ms) Since the query has a LIMIT 10 clause, we could probably short circuit the first two (I guess there are no delete files), and then admission would be also faster probably. Is it possible to filter the coordinator logs for the query id and send it to us, if it doesn't contain any sensitive information (column names, etc.)? That'll provide us with more hints about where the time was spent. Cheers, Zoltan On Thu, Jan 9, 2025 at 12:47 PM Saulius Valatka <saulius...@gmail.com> wrote: > Yeah, sorry, I was probably wrong about the "blocked" part: it's the select > statement without a filter that's taking very long, regardless of any > running refreshes in parallel. Adding a partition filter for the last day > brings down the execution time to ~10 seconds (still way below Trino, it > can do it in ~2 seconds). > > 2025-01-09, kt, 13:31 Saulius Valatka <saulius...@gmail.com> rašė: > > > The table being "blocked" is just my hypothesis, maybe I'm wrong? I can > > see that when a REFRESH statement is running, issuing a simple "select * > > from table limit 10" sits in "CREATED Query submitted" state until > roughly > > when the refresh statement finishes. > > Just out of curiosity I launched a Trino cluster on the same servers as > > Impala and hooked it up to the same Iceberg catalog, issuing an identical > > "select * from table limit 10" runs in ~2 seconds, whereas in Impala it's > > ~2 minutes. > > > > Here's the details for the select query execution: > > > > Query Compilation: 1m14s > > - Metadata of all 2 tables cached: 42s528ms (42s528ms) > > - Analysis finished: 42s539ms (10.112ms) > > - Authorization finished (ranger): 42s540ms (1.246ms) > > - Value transfer graph computed: 42s540ms (609.596us) > > - Single node plan created: 1m13s (30s493ms) > > - Runtime filters computed: 1m13s (44.325us) > > - Distributed plan created: 1m13s (29.075us) > > - Parallel plans created: 1m13s (266.400us) > > - Planning finished: 1m14s (1s727ms) > > Query Timeline: 2m13s > > - Query submitted: 50.627us (50.627us) > > - Planning finished: 1m27s (1m27s) > > - Submit for admission: 1m27s (273.383us) > > - Completed admission: 1m55s (28s320ms) > > - Ready to start on 121 backends: 1m55s (91.685ms) > > - All 121 execution backends (1921 fragment instances) started: 2m7s > > (11s578ms) > > - Rows available: 2m7s (1.412ms) > > - First row fetched: 2m7s (291.051ms) > > - Last row fetched: 2m7s (225.054ms) > > - Released admission control resources: 2m13s (5s433ms) > > - Unregister query: 2m13s (100.938ms) > > > > Catalog Server Operation: 49s120ms > > - Got Metastore client: 3.116us (3.116us) > > - Got catalog version read lock: 2.250ms (2.247ms) > > - Got catalog version write lock and table write lock: 2.315ms > > (65.074us) > > - Got Metastore client: 2.320ms (4.258us) > > - Fetched table from Metastore: 12.885ms (10.565ms) > > - Loaded Iceberg API table: 72.570ms (59.684ms) > > - Loaded schema from Iceberg: 72.751ms (181.127us) > > - Loaded Iceberg files: 7s925ms (7s852ms) > > - Loaded all column stats: 7s953ms (28.404ms) > > - Loaded table schema: 7s958ms (5.226ms) > > - Start refreshing file metadata: 7s959ms (431.115us) > > - Loaded file metadata for 1 partitions: 22s081ms (14s122ms) > > - Loaded all column stats: 49s120ms (27s038ms) > > - Loaded table: 49s120ms (30.027us) > > - Finished resetMetadata request: 49s120ms (312.849us) > > Query Compilation: 53s727ms > > - Metadata of all 2 tables cached: 53s726ms (53s726ms) > > - Analysis finished: 53s726ms (67.048us) > > - Authorization finished (ranger): 53s727ms (422.569us) > > - Planning finished: 53s727ms (137.463us) > > Query Timeline: 1m44s > > - Query submitted: 42.071us (42.071us) > > - Planning finished: 53s735ms (53s735ms) > > - CatalogDdlRequest finished: 1m42s (49s122ms) > > - Applied catalog updates from DDL: 1m43s (1s049ms) > > - Request finished: 1m43s (14.695ms) > > - Unregister query: 1m44s (912.626ms) > > > > 2025-01-09, kt, 12:33 Zoltán Borók-Nagy <borokna...@cloudera.com> rašė: > > > >> Thanks for the update. > >> > >> So the whole REFRESH operation took 1m50s. From this CatalogDdlRequest > was > >> only 46s313ms. This 46s313ms is the time when the table is blocked, > right? > >> From CatalogDdlRequest the longest operation was loading column stats > >> which > >> took 27s300ms. This is a single RPC (getTableColumnStatistics()) toward > >> HMS, it would be good to know why it took so long. Especially given that > >> loading file metadata for this huge table was around 19 seconds. > >> It's also interesting that "Loaded all column stats" appears twice in > the > >> catalog timeline. At first it took 11.697ms, and the second invocation > was > >> the one that took 27s300ms. Hopefully we can get rid of the second > >> invocation but that'll require a code change. > >> > >> I also wonder why the table is not queryable from the Coordinator cache > >> while it is being reloaded in CatalogD, I hope we can fix this as well. > >> > >> Is the table expected to grow indefinitely? Or do you drop/relocate old > >> partitions after some time? > >> > >> Cheers, > >> Zoltan > >> > >> > >> On Wed, Jan 8, 2025 at 9:19 PM Saulius Valatka <saulius...@gmail.com> > >> wrote: > >> > >> > Hi, > >> > > >> > so I just tried applying IMPALA-13254 on top of 4.4.1, redeployed and > >> > refresh times for the largest table went down from ~80 minutes to ~2 > >> > minutes! > >> > That's waaay better, but still not ideal: if we issue a refresh every > 15 > >> > minutes, there's still a lot of time the table is blocked for a minute > >> or > >> > two, but at least now it's queryable. > >> > > >> > Here's an example REFRESH timeline: > >> > > >> > Catalog Server Operation: 46s213ms > >> > - Got Metastore client: 5.771us (5.771us) > >> > - Got catalog version read lock: 2.334ms (2.328ms) > >> > - Got catalog version write lock and table write lock: 2.459ms > >> > (125.470us) > >> > - Got Metastore client: 2.465ms (6.181us) > >> > - Fetched table from Metastore: 12.831ms (10.366ms) > >> > - Loaded Iceberg API table: 139.024ms (126.192ms) > >> > - Loaded schema from Iceberg: 139.175ms (150.949us) > >> > - Loaded Iceberg files: 5s036ms (4s897ms) > >> > - Loaded all column stats: 5s047ms (11.697ms) > >> > - Loaded table schema: 5s053ms (5.279ms) > >> > - Start refreshing file metadata: 5s053ms (291.959us) > >> > - Loaded file metadata for 1 partitions: 18s912ms (13s859ms) > >> > - Loaded all column stats: 46s213ms (27s300ms) > >> > - Loaded table: 46s213ms (36.600us) > >> > - Finished resetMetadata request: 46s213ms (485.329us) > >> > Query Compilation: 1m3s > >> > - Metadata of all 2 tables cached: 1m3s (1m3s) > >> > - Analysis finished: 1m3s (237.705us) > >> > - Authorization finished (ranger): 1m3s (808.267us) > >> > - Planning finished: 1m3s (581.713us) > >> > Query Timeline: 1m51s > >> > - Query submitted: 40.227us (40.227us) > >> > - Planning finished: 1m3s (1m3s) > >> > - CatalogDdlRequest finished: 1m50s (46s313ms) > >> > - Applied catalog updates from DDL: 1m50s (24.206ms) > >> > - Request finished: 1m50s (202.949us) > >> > - Unregister query: 1m51s (763.412ms) > >> > > >> > > >> > 2025-01-08, tr, 17:50 Zoltán Borók-Nagy <borokna...@cloudera.com> > rašė: > >> > > >> > > Thanks for the info, Saulius. > >> > > > >> > > If you try out IMPALA-13254, please let us know how much it helps in > >> > > your case. > >> > > Hopefully it speeds up table loading times enough so it won't cause > >> too > >> > > much turbulence. > >> > > Some table loading statistics would be also helpful to know where > the > >> > time > >> > > is being spent. > >> > > > >> > > Do you use local catalog mode? > >> > > > https://impala.apache.org/docs/build/html/topics/impala_metadata.html > >> > > I'm not sure how much it will help, but it could be worth trying > out. > >> > > > >> > > Cheers, > >> > > Zoltan > >> > > > >> > > > >> > > On Wed, Jan 8, 2025 at 2:45 PM Saulius Valatka < > saulius...@gmail.com> > >> > > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > sorry, maybe I worded my question wrong: I understand that > >> refreshing > >> > is > >> > > > needed (either automatic or manual), main concerns are the latency > >> of > >> > the > >> > > > refresh and the fact that the table is not queryable while it's > >> being > >> > > > refreshed - for large tables that are being updated frequently > this > >> > > > combination makes them essentially un-queryable. > >> > > > > >> > > > 2025-01-08, tr, 15:17 Gabor Kaszab <gaborkas...@apache.org> rašė: > >> > > > > >> > > > > Hi, > >> > > > > > >> > > > > I don't think that the issue you describe is specific to Iceberg > >> in a > >> > > > sense > >> > > > > that even for Hive tables if you make changes using an engine > that > >> > > > doesn't > >> > > > > trigger HMS events, one has to issue refresh/invalidate metadata > >> to > >> > see > >> > > > the > >> > > > > changes reflected in Impala. > >> > > > > Could you share what catalog you use for your Iceberg tables? > And > >> > what > >> > > > tool > >> > > > > do you use for data ingestion into these tables? > >> > > > > If you use the HMS backed HiveCatalog as a catalog and an engine > >> that > >> > > > > triggers HMS notifications, like Spark or Hive then even for > >> Iceberg > >> > > > tables > >> > > > > you can avoid executing refresh manually. > >> > > > > > >> > > > > Gabor > >> > > > > > >> > > > > On Wed, Jan 8, 2025 at 1:48 PM Saulius Valatka < > >> saulius...@gmail.com > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hi, > >> > > > > > > >> > > > > > If I understand correctly, once an Iceberg table is mutated > >> outside > >> > > of > >> > > > > > Impala one has to run a refresh or invalidate statement. We > >> noticed > >> > > > that > >> > > > > > running refresh on huge tables can take minutes and while that > >> is > >> > > > > happening > >> > > > > > querying them is blocked. We have large event tables that are > >> being > >> > > > > updated > >> > > > > > very frequently in real-time, by default we run a refresh > after > >> > each > >> > > > > > update, so effectively this means such tables are > un-queryable, > >> as > >> > > > > they're > >> > > > > > constantly being refreshed. > >> > > > > > > >> > > > > > Is there something I'm missing? What would the recommendation > >> here > >> > > be? > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >