Jibing-Li opened a new pull request, #46534:
URL: https://github.com/apache/doris/pull/46534

   ### What problem does this PR solve?
   
   When doing sample analyze for partition column and key column, BE may 
encounter OOM problem. The reason is, partition column need to choose at least 
one tablet in each partition to calculate the NDV and couldn't use limit in the 
SQL, so when the table has large number of partitions and each tablet in each 
partition is quite large, the sample SQL may try to read too many data which 
will cause BE OOM.
   Similarly, key column couldn't use limit as well, so when one tablet is very 
large, it also could cause OOM.
   
   This pr is try to solve this problem.
   For partition columns, when the selected tablets contain more than 100000000 
rows, we use ndv() function to read up to 5 partitions to get the NDV value of 
this 5 partitions, say n. Suppose the row count in the 5 partitions is r, and 
the row count of table is R, the table NDV would be n * R / r.
   ndv() function use hll, so it only use a small amount of memory.
   
   For key columns, when the selected tablets contain more than 100000000 rows, 
we use limit 1000000000 to control the rows to read.
   
   Reading 100000000 rows would use at most 8GB memory in BE.
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [x] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [x] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [x] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to