xiangfu0 opened a new pull request, #18165: URL: https://github.com/apache/pinot/pull/18165
## What changed This adds partition function expressions so Pinot can compute segment partitions and query-time pruning partitions from the raw column through a deterministic scalar-function pipeline instead of a single hard-coded partition function. Key changes: - add expression-mode partition config support via `functionExpr` while preserving existing `functionName` behavior - compile restricted partition expressions into a typed pipeline in `pinot-segment-spi` - allow deterministic scalar functions, including varargs with literals, while enforcing a single raw-column argument chain - persist expression-based partition metadata through offline and realtime paths - enable broker and server pruning for equality and IN predicates on the raw column using the same partition pipeline - add built-in coverage for md5/fnv/murmur/lower/bucket-style cases and compatibility helpers ## Why Pinot's partition model assumed one partition function per raw column. That breaks down for common production layouts where the upstream partition key is derived by chaining transforms such as `md5(id) -> fnv1a_32(...) -> partition` or `lower(key) -> murmur2(...) -> partition`. Derived ingestion columns do not preserve correct pruning semantics when queries still filter on the raw column. This PR keeps partitioning attached to the raw column and evaluates the same deterministic pipeline against query literals so pruning remains correct. ## Safety and correctness This also fixes a few issues discovered during review: - non-static scalar functions now use thread-local target instances in the partition-expression compiler - broker and server pruning fail open if partition-function evaluation throws for a query literal - broker partition metadata refresh treats invalid expression metadata as unprunable instead of failing the refresh path ## Validation Ran: - `./mvnw spotless:apply -pl pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools` - `./mvnw checkstyle:check -pl pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools` - `./mvnw license:format -pl pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools` - `./mvnw license:check -pl pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools` - `./mvnw -pl pinot-core,pinot-broker,pinot-segment-spi -am test -Dtest=ColumnValueSegmentPrunerTest,PartitionFunctionExprCompilerTest,PartitionFunctionExprSegmentPrunerTest,SegmentPartitionMetadataManagerTest -Dsurefire.failIfNoSpecifiedTests=false` Notable regression coverage includes: - `fnv1a_32(md5(id))` with 128 partitions and UUID `000016be-9d72-466c-9632-cfa680dc8fa3` mapping to partition 104 - broker pruning for the same UUID equality predicate - fail-open behavior for invalid literals and invalid partition-expression metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
