davecromberge opened a new pull request, #17238:
URL: https://github.com/apache/pinot/pull/17238

   Enable size-based segment generation for tables with variable-sized data
   (e.g., Theta sketches) where static row counts produce inconsistent segment
   sizes.
   
   Implemented two strategies:
   
   - AdaptiveSegmentNumRowProvider: EMA-based learning for homogeneous data
   - PercentileAdaptiveSegmentNumRowProvider: Reservoir sampling with percentile
     estimation for heterogeneous/multi-tenant data (resistant to outliers)
   
   Configuration reads directly from MergeRollupTask config map, following the
   eraseDimensionValues pattern. No changes to shared SegmentConfig or 
framework.
   
   Example config:
   {
     "MergeRollupTask": {
       "desiredSegmentSizeBytes": "209715200",
       "segmentSizingStrategy": "PERCENTILE",
       "sizingPercentile": "75"
     }
   }
   
   Instructions:
   
   The PR has to be tagged with at least one of the following labels (*):
   - `feature`
   - `performance`
   - `release-notes` - New configuration options
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to