klsince commented on code in PR #13837:
URL: https://github.com/apache/pinot/pull/13837#discussion_r1746526919
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/IngestionUtils.java:
##########
@@ -169,6 +170,16 @@ private static SegmentNameGenerator
getSegmentNameGenerator(BatchConfig batchCon
case BatchConfigProperties.SegmentNameGeneratorType.SIMPLE:
return new SimpleSegmentNameGenerator(rawTableName,
batchConfig.getSegmentNamePostfix(),
batchConfig.isAppendUUIDToSegmentName(),
batchConfig.isExcludeTimeInSegmentName());
+ case BatchConfigProperties.SegmentNameGeneratorType.UPLOADED_REALTIME:
+ int uploadedRealtimePartitionId;
+ try {
+ uploadedRealtimePartitionId =
Integer.parseInt(batchConfig.getUploadedRealtimePartitionId());
+ } catch (NumberFormatException e) {
+ throw new IllegalArgumentException(
+ String.format("Invalid uploadedRealtimePartitionId: %s",
batchConfig.getUploadedRealtimePartitionId()));
+ }
+ return new UploadedRealtimeSegmentNameGenerator(rawTableName,
uploadedRealtimePartitionId,
+ batchConfig.getSegmentUploadTimeMs(),
batchConfig.getSegmentNamePrefix(), batchConfig.getSequenceId());
Review Comment:
I see, and I just spotted this in the README that each task will generate
segments for one particular table partition, then it make sense to use a
sequenceId local to each task.
```
The parallelism of the job *must* be set the same as the number of
partitions of the Pinot table(same as upstream), so that the sink in each task
executor can generate the segment of same partitions.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]