[
https://issues.apache.org/jira/browse/IMPALA-12509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779783#comment-17779783
]
Fu Lili commented on IMPALA-12509:
----------------------------------
First, we encountered a slow query problem in a customer environment. This
customer had an Iceberg table with 200,000 files and thousands of Partitions.
We found that even a simple SELECT COUNT(*) of a single Partition takes several
seconds. From the Profile, we can see that the average backend startup time
reaches 1s, this is an uncommon situation. Through code troubleshooting, we
suspect that only the serialization of TQueryCtx may cause this problem. Due to
customer security concerns, relevant logs or profile screenshots cannot be
provided here.
Then we constructed an Iceberg table with 4000 files in the test environment,
and found that the size of TQueryCtx has reached 2MB, and it is obvious that
this size is positively correlated with the number of files, so it is basically
clear that there is a problem here.
!image-2023-10-26-15-34-28-254.png!
Finally, after we deployed the optimized version to the customer environment,
the backend startup time of the same query was reduced to tens of milliseconds.
> Optimize the backend startup and planner time of large Iceberg table query
> --------------------------------------------------------------------------
>
> Key: IMPALA-12509
> URL: https://issues.apache.org/jira/browse/IMPALA-12509
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Fu Lili
> Assignee: Fu Lili
> Priority: Major
> Attachments: image-2023-10-26-15-18-55-493.png,
> image-2023-10-26-15-19-56-408.png, image-2023-10-26-15-34-28-254.png
>
>
> We found that when querying an Iceberg table with a large number of files
> (>=200000), the Query Plan and start backends took an abnormal time (>= 2s).
> The reason was that unnecessary objects were serialized when building
> TQueryCtx. The main function involved is IcebergTable::toThriftDescriptor
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]