rohityadav1993 commented on code in PR #13107:
URL: https://github.com/apache/pinot/pull/13107#discussion_r1603148256
##########
pinot-common/src/main/java/org/apache/pinot/common/metadata/segment/SegmentPartitionMetadata.java:
##########
@@ -48,6 +53,21 @@ public SegmentPartitionMetadata(
@Nonnull @JsonProperty("columnPartitionMap") Map<String,
ColumnPartitionMetadata> columnPartitionMap) {
Preconditions.checkNotNull(columnPartitionMap);
_columnPartitionMap = columnPartitionMap;
+ _uploadedSegmentPartitionId = -1;
+ }
+
+ /**
+ * Constructor for the class.
+ *
+ * @param columnPartitionMap Column name to ColumnPartitionMetadata map.
+ */
+ @JsonCreator
+ public SegmentPartitionMetadata(
+ @Nullable @JsonProperty("columnPartitionMap") Map<String,
ColumnPartitionMetadata> columnPartitionMap,
+ @Nullable @JsonProperty(value = "uploadedSegmentPartitionId",
defaultValue = "-1")
Review Comment:
> I might have missed it, but how to configure SegmentPartitionConfig in
TableConfig for tables that allow to upload segments built and partitioned
externally?
We don't need to configure the table similar to how we don't need to do for
realtime stream ingestion for upsert tables.
Providing some more context for why the change is needed
There are two scenarios where data partitioning comes to play:
1. Query routing:
[[docs](https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#data-ingested-partitioned-by-some-column)]
Data partitioning is not a requirement here but a good optimization.
2. Segment assignment:
a. If the data is partitioned on a single column and with a Pinot
supported algorithm, we configure the table as:
```
...
"tableIndexConfig": {
...
"segmentPartitionConfig": {
"columnPartitionMap": {
"memberId": {
"functionName": "Modulo",
"numPartitions": 3
}
}
},
...
},
```
**Partitioning for upsert tables**:
Consuming segment assignement: The stream is always externally
partitioned (either on PK or other field which can still ensure the PKs are all
part of the same partition) and does not need to use one of Pinot supported
algorithms. `segmentPartitionConfig` need not be set for the upsert table
either. Each `LLCSegmentName` contains a partitionId substring which is derived
from the streams partitionId. When assigning a segment to instance, we get the
partition id by parsing the LLCSegmentName in
`SegmentUtils.getRealtimeSegmentPartitionId`.
**Uploaded segment assignment**: Uploaded segments are not generated
with LLCSegmentName convention. The only way to specify partitioning info is
via `segmentPartitionConfig` via table config which is not possible if the
stream is using custom partitioning.
If one wants to backfill/uplaod segment to such custom partitioned
stream, the uploaded segment must provide the partitionId so the
segmentAssignement can put the segments in the same instances as the consuming
segments of the same partition un pusert table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]