Re: [PR] [Backfill] allow externally partitioned segment uploads for upsert tables [pinot]

via GitHub Fri, 17 May 2024 11:09:05 -0700


klsince commented on code in PR #13107:
URL: https://github.com/apache/pinot/pull/13107#discussion_r1605390671



##########
pinot-common/src/main/java/org/apache/pinot/common/metadata/segment/SegmentPartitionMetadata.java:
##########
@@ -48,6 +53,21 @@ public SegmentPartitionMetadata(
       @Nonnull @JsonProperty("columnPartitionMap") Map<String, 
ColumnPartitionMetadata> columnPartitionMap) {
     Preconditions.checkNotNull(columnPartitionMap);
     _columnPartitionMap = columnPartitionMap;
+    _uploadedSegmentPartitionId = -1;
+  }
+
+  /**
+   * Constructor for the class.
+   *
+   * @param columnPartitionMap Column name to ColumnPartitionMetadata map.
+   */
+  @JsonCreator
+  public SegmentPartitionMetadata(
+      @Nullable @JsonProperty("columnPartitionMap") Map<String, 
ColumnPartitionMetadata> columnPartitionMap,
+      @Nullable @JsonProperty(value = "uploadedSegmentPartitionId", 
defaultValue = "-1")

Review Comment:
   I actually dropped {sequenceId} intentionally, otherwise, the segment name 
becomes LLC format, and would cause the complexities you mentioned in another 
alternative proposal, like the checks on start/end offsets. In fact, the 
endOffset of latest LLC segment (the one with max seqId) is used to resume 
stream consumption, and we probably don't want to affect that logic with 
uploaded segments.
   
   > Do we enforce naming conventions? ... ... add a SegmentNameGenerator 
implementation ...
   
   I don't think so. We may need one just for uploading segments to upsert 
tables, due to its special requirements on partition id and to break comparison 
ties. And good point for the need of a new name generator.
   
   btw, from what I learnt while reviewing this PR, there seems a design choice 
for RT segments that the {partitionId} encoded in the segment name is the 
source of truth, rather than the one kept in ZK metadata as there might no 
partition info at all in ZK metadata. So following on that design principle, 
I'd prefer to keep encoding the {partitionId} in segment name for uploaded 
segments as well. In this way, we can 1) avoid the cost of reading ZK metadata 
every time we need partitionId for the uploaded segments; 2) avoid changes on 
persistent segment metadata in ZK or on disk, which might make things a bit 
easier when to consider upgrade/downgrade.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Backfill] allow externally partitioned segment uploads for upsert tables [pinot]

Reply via email to