rohityadav1993 opened a new issue, #12987: URL: https://github.com/apache/pinot/issues/12987
## Problem #6567 allows uploading a batch generated segment to Pinot upsert realtime table. Partitioned data is handled by defining the partition column in `segmentPartitionConfig.columnPartitionMap`. The addSegment flow uses the config to identify the partition value of the column in metadata.properties of the uplaoded segment and then assign the segment to instance based on the partition id and instance assignment zk metadata. This puts a restriction to define a partition column that is part of the table column(primary key) and also use only the configured [partition function](https://github.com/apache/pinot/blob/a5c728f549fe1be5560a88080caaa2063def3d87/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java#L31) for partitioning data during segment creating in batch job using `SegmentIndexCreationDriverImpl` This restriction is not applicable for stream generated segments during realtime ingestion. The partition id is identified using the LLC segment name convention. The stream is partitioned externally and Pinot table does not need to be aware of the partitioned column/function. ## Proposal Proposal to enhance `SegmentAssignment.assignSegment()` flow to rely on externally provided partition id for an uplaoded segment. 1. Need to persist the partition id as part of segment zk metadata. 2. Modify the `StrictRealtimeSegmentAssignment.assignSegment` to get partition id from zk metadata. 3. Provide partition id externally: 1. Option 1: Provide partition id as http headers during segment upload 2. Option 2: Provide partition id as part of uploaded segment metadata(not as columnPartitionMap) (metadata.properties) Related: #10896, #11914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
