[
https://issues.apache.org/jira/browse/IMPALA-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679214#comment-17679214
]
Zoltán Borók-Nagy commented on IMPALA-11701:
--------------------------------------------
Hi [~libra_816], thank you for reporting this issue.
The long planning time might have been resolved since.
Though it seems like we don't have count(*)-optimization in this case. We'll
look into it.
> Slow query problem about querying iceberg table by impala
> ---------------------------------------------------------
>
> Key: IMPALA-11701
> URL: https://issues.apache.org/jira/browse/IMPALA-11701
> Project: IMPALA
> Issue Type: Bug
> Reporter: Qizhu Chan
> Priority: Critical
> Labels: impala-iceberg
> Attachments: image-2022-11-03-17-37-14-712.png,
> profile_cf446a1ab3a5e852_1b1005de00000000.txt
>
>
> I use impala to query iceberg table, but the query efficiency is not ideal,
> compared with querying the hive format table of the same data, the
> time-consuming increase is dozens of times.
> The sql statement used is a very simple statistical query, be like :
> select count(*) from `db_name`.tbl_name where datekey='20221001' and
> event='xxx'
> ('datekey' and 'event' are the partition fields)
> My personal feeling is that impala might fetch iceberg's metadata stats and
> return results very quickly, but it doesn't.
> The catalog of iceberg table is of the hadoop type, and Impala can access it
> by creating an external table in hive. By the way, iceberg table will
> perform snapshot expiration and data compaction on a daily basis, so there
> should be no small file problems.
> I found this warning using the explain statement:
> {code:java}
> | WARNING: The following tables are missing relevant table and/or column
> statistics. |
> | iceberg.gamebox_event_iceberg
> {code}
> Query: SHOW TABLE STATS `iceberg`.gamebox_event_iceberg
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
> Incremental stats | Location
> |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | 0 | 590509 | 1.91TB | NOT CACHED | NOT CACHED | PARQUET |
> false | hdfs:///hive/warehouse/iceberg/gamebox_event_iceberg |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> It seems like Impala is not syncing iceberg's table and column statistics.
> I'm not sure if this has anything to do with slow queries.
> As shown in the screenshot, i think the query time is mainly on planning and
> execution backends , but I don't know what is the reason for these two time
> consuming.
> Attachment is the complete profile for this query.
> How do I speed up the query? Can someone help with my question?plz.....
> !image-2022-11-03-17-37-14-712.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]