[ 
https://issues.apache.org/jira/browse/IMPALA-12509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779783#comment-17779783
 ] 

Fu Lili commented on IMPALA-12509:
----------------------------------

First, we encountered a slow query problem in a customer environment. This 
customer had an Iceberg table with 200,000 files and thousands of Partitions. 
We found that even a simple SELECT COUNT(*) of a single Partition takes several 
seconds. From the Profile, we can see that the average backend startup time 
reaches 1s, this is an uncommon situation. Through code troubleshooting, we 
suspect that only the serialization of TQueryCtx may cause this problem. Due to 
customer security concerns, relevant logs or profile screenshots cannot be 
provided here.

Then we constructed an Iceberg table with 4000 files in the test environment, 
and found that the size of TQueryCtx has reached 2MB, and it is obvious that 
this size is positively correlated with the number of files, so it is basically 
clear that there is a problem here.

!image-2023-10-26-15-34-28-254.png!
Finally, after we deployed the optimized version to the customer environment, 
the backend startup time of the same query was reduced to tens of milliseconds.

 

> Optimize the backend startup and planner time of large Iceberg table query
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-12509
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12509
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Fu Lili
>            Assignee: Fu Lili
>            Priority: Major
>         Attachments: image-2023-10-26-15-18-55-493.png, 
> image-2023-10-26-15-19-56-408.png, image-2023-10-26-15-34-28-254.png
>
>
> We found that when querying an Iceberg table with a large number of files 
> (>=200000), the Query Plan and start backends took an abnormal time (>= 2s). 
> The reason was that unnecessary objects were serialized when building 
> TQueryCtx. The main function involved is IcebergTable::toThriftDescriptor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to