xiangfu0 opened a new pull request, #18549:
URL: https://github.com/apache/pinot/pull/18549
## Summary
Reject uploaded segments for offline upsert tables unless the configured
partition column has exactly one valid partition id in the segment metadata.
The validation runs in the controller upload path for both single-segment
and batch-segment uploads before storage quota checks and before segment ZK
metadata is created.
## Why
Offline upsert correctness depends on all records for a primary-key
partition being assigned to the same server set. If an uploaded segment spans
multiple partition ids, segment assignment can place mixed-partition data as
though it belonged to one partition, which can produce incorrect upsert/dedup
state.
## User Manual
For offline upsert tables, build and upload segments with partition metadata
for the same column configured in `indexingConfig.segmentPartitionConfig` and
replica-group partition assignment. Each uploaded segment must contain rows for
exactly one partition id of that column.
Uploads are rejected with `400 BAD_REQUEST` when:
- the offline upsert table is missing segment partition config
- the configured partition column is missing partition metadata in the
segment
- the segment contains zero or multiple partition ids for that column
- the partition id is outside `[0, numPartitions)`
## Sample Table Config
```json
{
"tableName": "myTable_OFFLINE",
"tableType": "OFFLINE",
"upsertConfig": {
"mode": "FULL"
},
"routing": {
"instanceSelectorType": "strictReplicaGroup"
},
"segmentsConfig": {
"replication": "2",
"replicaGroupStrategyConfig": {
"partitionColumn": "pk",
"numInstancesPerPartition": 1
}
},
"indexingConfig": {
"segmentPartitionConfig": {
"columnPartitionMap": {
"pk": {
"functionName": "Murmur",
"numPartitions": 16
}
}
}
}
}
```
With this config, each uploaded segment must have partition metadata for
`pk` containing exactly one id, such as `{3}`. A segment containing `{3, 4}` is
rejected.
## Tests
- `./mvnw -pl pinot-controller -am -Dtest=SegmentValidationUtilsTest
-Dsurefire.failIfNoSpecifiedTests=false test`
- `./mvnw spotless:apply -pl pinot-controller`
- `./mvnw checkstyle:check -pl pinot-controller`
- `./mvnw license:format -pl pinot-controller`
- `./mvnw license:check -pl pinot-controller`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]