davecromberge opened a new pull request, #17238:
URL: https://github.com/apache/pinot/pull/17238
Enable size-based segment generation for tables with variable-sized data
(e.g., Theta sketches) where static row counts produce inconsistent segment
sizes.
Implemented two strategies:
- AdaptiveSegmentNumRowProvider: EMA-based learning for homogeneous data
- PercentileAdaptiveSegmentNumRowProvider: Reservoir sampling with percentile
estimation for heterogeneous/multi-tenant data (resistant to outliers)
Configuration reads directly from MergeRollupTask config map, following the
eraseDimensionValues pattern. No changes to shared SegmentConfig or
framework.
Example config:
{
"MergeRollupTask": {
"desiredSegmentSizeBytes": "209715200",
"segmentSizingStrategy": "PERCENTILE",
"sizingPercentile": "75"
}
}
Instructions:
The PR has to be tagged with at least one of the following labels (*):
- `feature`
- `performance`
- `release-notes` - New configuration options
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]