zenoyang opened a new pull request, #21888:
URL: https://github.com/apache/doris/pull/21888

   
   
   ## Proposed changes
   The original logic is to first deserialize the ColumnString into a HashSet 
(insert the deserialized elements into the hashset), and then traverse all the 
HashSet elements into the target HashSet during the merge phase.
   After optimization, when deserializing, elements are directly inserted into 
the target HashSet, thereby reducing unnecessary hashset insert overhead.
   
   In one of our internal query tests, 30 hashsets were merged in second phase 
aggregation(the average cardinality is 1,400,000), and the cardinality after 
merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms 
to 3s375ms.
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to