xiangfu0 opened a new pull request, #18549:
URL: https://github.com/apache/pinot/pull/18549

   ## Summary
   
   Reject uploaded segments for offline upsert tables unless the configured 
partition column has exactly one valid partition id in the segment metadata.
   
   The validation runs in the controller upload path for both single-segment 
and batch-segment uploads before storage quota checks and before segment ZK 
metadata is created.
   
   ## Why
   
   Offline upsert correctness depends on all records for a primary-key 
partition being assigned to the same server set. If an uploaded segment spans 
multiple partition ids, segment assignment can place mixed-partition data as 
though it belonged to one partition, which can produce incorrect upsert/dedup 
state.
   
   ## User Manual
   
   For offline upsert tables, build and upload segments with partition metadata 
for the same column configured in `indexingConfig.segmentPartitionConfig` and 
replica-group partition assignment. Each uploaded segment must contain rows for 
exactly one partition id of that column.
   
   Uploads are rejected with `400 BAD_REQUEST` when:
   
   - the offline upsert table is missing segment partition config
   - the configured partition column is missing partition metadata in the 
segment
   - the segment contains zero or multiple partition ids for that column
   - the partition id is outside `[0, numPartitions)`
   
   ## Sample Table Config
   
   ```json
   {
     "tableName": "myTable_OFFLINE",
     "tableType": "OFFLINE",
     "upsertConfig": {
       "mode": "FULL"
     },
     "routing": {
       "instanceSelectorType": "strictReplicaGroup"
     },
     "segmentsConfig": {
       "replication": "2",
       "replicaGroupStrategyConfig": {
         "partitionColumn": "pk",
         "numInstancesPerPartition": 1
       }
     },
     "indexingConfig": {
       "segmentPartitionConfig": {
         "columnPartitionMap": {
           "pk": {
             "functionName": "Murmur",
             "numPartitions": 16
           }
         }
       }
     }
   }
   ```
   
   With this config, each uploaded segment must have partition metadata for 
`pk` containing exactly one id, such as `{3}`. A segment containing `{3, 4}` is 
rejected.
   
   ## Tests
   
   - `./mvnw -pl pinot-controller -am -Dtest=SegmentValidationUtilsTest 
-Dsurefire.failIfNoSpecifiedTests=false test`
   - `./mvnw spotless:apply -pl pinot-controller`
   - `./mvnw checkstyle:check -pl pinot-controller`
   - `./mvnw license:format -pl pinot-controller`
   - `./mvnw license:check -pl pinot-controller`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to