xiangfu0 opened a new pull request, #18165:
URL: https://github.com/apache/pinot/pull/18165

   ## What changed
   
   This adds partition function expressions so Pinot can compute segment 
partitions and query-time pruning partitions from the raw column through a 
deterministic scalar-function pipeline instead of a single hard-coded partition 
function.
   
   Key changes:
   - add expression-mode partition config support via `functionExpr` while 
preserving existing `functionName` behavior
   - compile restricted partition expressions into a typed pipeline in 
`pinot-segment-spi`
   - allow deterministic scalar functions, including varargs with literals, 
while enforcing a single raw-column argument chain
   - persist expression-based partition metadata through offline and realtime 
paths
   - enable broker and server pruning for equality and IN predicates on the raw 
column using the same partition pipeline
   - add built-in coverage for md5/fnv/murmur/lower/bucket-style cases and 
compatibility helpers
   
   ## Why
   
   Pinot's partition model assumed one partition function per raw column. That 
breaks down for common production layouts where the upstream partition key is 
derived by chaining transforms such as `md5(id) -> fnv1a_32(...) -> partition` 
or `lower(key) -> murmur2(...) -> partition`. Derived ingestion columns do not 
preserve correct pruning semantics when queries still filter on the raw column.
   
   This PR keeps partitioning attached to the raw column and evaluates the same 
deterministic pipeline against query literals so pruning remains correct.
   
   ## Safety and correctness
   
   This also fixes a few issues discovered during review:
   - non-static scalar functions now use thread-local target instances in the 
partition-expression compiler
   - broker and server pruning fail open if partition-function evaluation 
throws for a query literal
   - broker partition metadata refresh treats invalid expression metadata as 
unprunable instead of failing the refresh path
   
   ## Validation
   
   Ran:
   - `./mvnw spotless:apply -pl 
pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools`
   - `./mvnw checkstyle:check -pl 
pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools`
   - `./mvnw license:format -pl 
pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools`
   - `./mvnw license:check -pl 
pinot-common,pinot-spi,pinot-segment-spi,pinot-segment-local,pinot-core,pinot-broker,pinot-controller,pinot-tools`
   - `./mvnw -pl pinot-core,pinot-broker,pinot-segment-spi -am test 
-Dtest=ColumnValueSegmentPrunerTest,PartitionFunctionExprCompilerTest,PartitionFunctionExprSegmentPrunerTest,SegmentPartitionMetadataManagerTest
 -Dsurefire.failIfNoSpecifiedTests=false`
   
   Notable regression coverage includes:
   - `fnv1a_32(md5(id))` with 128 partitions and UUID 
`000016be-9d72-466c-9632-cfa680dc8fa3` mapping to partition 104
   - broker pruning for the same UUID equality predicate
   - fail-open behavior for invalid literals and invalid partition-expression 
metadata
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to