wuwenw opened a new pull request #6991:
URL: https://github.com/apache/incubator-pinot/pull/6991


   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, 
etc.
   -->
   One of the major bottlenecks for the current GroupBy OrderBy query on high 
cardinality columns is the merge phase. Essentially every segment brings a 
large number of intermediate results to a global concurrent map for further 
aggregation and merge, which takes up a lot of space and is very 
time-consuming. This PR introduces an optimization option that each segment 
trims its intermediate results to a given size. The size is configurable by the 
user and is guaranteed to be max(limit N * 5, 5000). It won't affect accuracy 
much but reduces the running time for high cardinality dataset. ~5 times faster 
for String data with 10M cardinality. This option is turned off by default to 
ensure backward compatibility. 
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: 
Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete 
the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and 
complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to 
consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [x] Yes (Please label this PR as **<code>release-notes</code>** and 
complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of 
the
   next release. -->
   Optimized GroupBy OrderBy queries by introducing an in-segment trim option 
that can significantly reduce the size of intermediate results and speed up the 
execution.
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to 
the documentation as well.
   See 
https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to