xiangfu0 commented on code in PR #18860:
URL: https://github.com/apache/pinot/pull/18860#discussion_r3487855159
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -1108,6 +1111,47 @@ public void reloadSegment(String segmentName,
IndexLoadingConfig indexLoadingCon
_logger.warn("Skipping reload for segment: {} — concurrently offloaded
after dispatch", segmentName);
return;
}
+ // For an upsert table with metadata TTL and snapshots enabled, a reload
must not blindly re-scan the segment and
+ // re-add every primary key, as that would resurrect keys already
expired (metadataTTL) or deleted
+ // (deletedKeysTTL). Instead we rebuild the upsert metadata from the
persisted validDocIds snapshot (valid docs
+ // only). The snapshot is indexed by docId, so it is only meaningful
when the segment content (CRC) is unchanged:
+ // - normal reload: only proceed when the CRC matches and a snapshot
exists, else fail the reload;
+ // - forceDownload reload: always proceed, using the snapshot if
present (operator accepts a possible CRC
+ // mismatch), otherwise fall back to a regular full scan of the
downloaded segment.
+ // Gated on snapshots being enabled so tables that never persist a
snapshot (e.g. deletedKeysTTL-only without
+ // snapshot) keep the regular reload behavior instead of always failing.
+ byte[] validDocIdsSnapshotBytes = null;
+ UpsertContext upsertContext = isUpsertEnabled() ?
_tableUpsertMetadataManager.getContext() : null;
+ if (upsertContext != null && upsertContext.isSnapshotEnabled()
+ && (upsertContext.getMetadataTTL() > 0 ||
upsertContext.getDeletedKeysTTL() > 0)) {
+ File snapshotFile = new
File(SegmentDirectoryPaths.findSegmentDirectory(indexDir),
+ V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME);
+ boolean snapshotExists = snapshotFile.exists();
+ boolean crcMatch = hasSameCRC(zkMetadata, localMetadata);
+ if (!forceDownload && (!crcMatch || !snapshotExists)) {
Review Comment:
This changes the existing reload contract for clusters that set
`instance.check.crc.on.segment.load=false`. Today that knob means 'skip CRC
enforcement and keep reloading the local segment'; this branch now hard-fails
TTL+snapshot upsert reloads even when CRC checking is explicitly disabled. That
is an operability/backward-compat regression for existing deployments. Please
gate the fail-fast path on `shouldCheckCRCOnSegmentLoad()` or add a separate
explicit knob for the stricter behavior.
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -1108,6 +1111,47 @@ public void reloadSegment(String segmentName,
IndexLoadingConfig indexLoadingCon
_logger.warn("Skipping reload for segment: {} — concurrently offloaded
after dispatch", segmentName);
return;
}
+ // For an upsert table with metadata TTL and snapshots enabled, a reload
must not blindly re-scan the segment and
+ // re-add every primary key, as that would resurrect keys already
expired (metadataTTL) or deleted
+ // (deletedKeysTTL). Instead we rebuild the upsert metadata from the
persisted validDocIds snapshot (valid docs
+ // only). The snapshot is indexed by docId, so it is only meaningful
when the segment content (CRC) is unchanged:
+ // - normal reload: only proceed when the CRC matches and a snapshot
exists, else fail the reload;
+ // - forceDownload reload: always proceed, using the snapshot if
present (operator accepts a possible CRC
+ // mismatch), otherwise fall back to a regular full scan of the
downloaded segment.
+ // Gated on snapshots being enabled so tables that never persist a
snapshot (e.g. deletedKeysTTL-only without
+ // snapshot) keep the regular reload behavior instead of always failing.
+ byte[] validDocIdsSnapshotBytes = null;
+ UpsertContext upsertContext = isUpsertEnabled() ?
_tableUpsertMetadataManager.getContext() : null;
+ if (upsertContext != null && upsertContext.isSnapshotEnabled()
+ && (upsertContext.getMetadataTTL() > 0 ||
upsertContext.getDeletedKeysTTL() > 0)) {
+ File snapshotFile = new
File(SegmentDirectoryPaths.findSegmentDirectory(indexDir),
+ V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME);
+ boolean snapshotExists = snapshotFile.exists();
+ boolean crcMatch = hasSameCRC(zkMetadata, localMetadata);
+ if (!forceDownload && (!crcMatch || !snapshotExists)) {
+ throw new IllegalStateException(String.format(
+ "Failing reload for segment: %s of upsert table with metadata
TTL: %s because %s. Use a forceDownload "
+ + "reload if you are okay with a CRC mismatch on the segment
and confident the local validDocIds "
+ + "snapshot's valid docs map to the deep-store segment;
otherwise the forceDownload reload will scan "
+ + "the entire segment data.", segmentName,
_tableNameWithType,
+ !crcMatch ? "the segment CRC has changed from: " +
localMetadata.getCrc() + " to: " + zkMetadata.getCrc()
+ : "no validDocIds snapshot is available"));
+ }
+ if (snapshotExists) {
+ if (forceDownload && !crcMatch) {
Review Comment:
`validDocIds` are docId-position based, so once `hasSameCRC()` says the
downloaded segment is a different copy there is no proof this local snapshot
still points at the same rows. Reusing it here can silently rebuild upsert
state against the wrong docIds and drop live keys or preserve expired/deleted
ones. Please fail closed on CRC/dataCRC mismatch unless the snapshot was
produced from the downloaded segment itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]