xiangfu0 opened a new pull request, #17532:
URL: https://github.com/apache/pinot/pull/17532
### Motivation
Large-table sampling needs to be deterministic and avoid query-time segment
selection overhead. This adds a pluggable “table sampler” definition in table
config and precomputes sampler-specific routing entries at the broker.
### Key changes
- Config: add `tableSamplers` to `TableConfig` (+ ZK SerDe support) and
query option `tableSampler=<name>`.
- Broker routing:
- Build and cache sampler-specific routing entries per table sampler name.
- Select routing entry at query time by tableSampler option (fallback to
default when absent/unknown).
- Refresh sampler routing entries on Helix assignment changes
(IdealState/ExternalView updates).
- Built-in samplers:
- `firstN`: select first N segments (lexicographic)
- `nPerDay`: select up to N segments per day using segment ZK start time
metadata
- MSQ support: propagate query options into MSQ leaf routing requests so
`tableSampler` works with multi-stage engine.
- Tests:
- Unit test for `nPerDay`
- Integration test (shared cluster) validating 10 segments/day × 7 days →
sampler returns 1 segment/day and group-by results reflect that
- Quickstart: add sample `tableSamplers` config to batch airlineStats table
config.
### How to use
- Table config example:
- `tableSamplers: [{ name: "perDay", type: "nPerDay", properties: {
numSegmentsPerDay: "1", timezone: "UTC" } }]`
- Query:
- `SET tableSampler=perDay; ...` or request `queryOptions:
"tableSampler=perDay"`
### Compatibility
- Fully backward compatible: if no sampler is configured or selected,
routing behavior is unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]