somandal commented on code in PR #16341:
URL: https://github.com/apache/pinot/pull/16341#discussion_r2210908874
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -1813,9 +1865,63 @@ public int fetch(String segmentName) {
}
}
+ @VisibleForTesting
+ @FunctionalInterface
+ interface DataLossRiskAssessor {
+ boolean hasDataLossRisk(String segmentName);
+ }
+
+ private static class DataLossRiskAssessorImpl implements
DataLossRiskAssessor {
+ private final String _tableNameWithType;
+ private final HelixManager _helixManager;
+ private final boolean _isPeerDownloadEnabled;
+ private final boolean _isUpsertOrDedupTable;
+ private final boolean _isPauselessEnabled;
+
+ private DataLossRiskAssessorImpl(String tableNameWithType, TableConfig
tableConfig, HelixManager helixManager) {
+ _tableNameWithType = tableNameWithType;
+ _helixManager = helixManager;
+ _isPeerDownloadEnabled =
!StringUtils.isEmpty(tableConfig.getValidationConfig().getPeerSegmentDownloadScheme());
+ _isUpsertOrDedupTable = tableConfig.isUpsertEnabled() ||
tableConfig.isDedupEnabled();
+ _isPauselessEnabled =
PauselessConsumptionUtils.isPauselessEnabled(tableConfig);
+ }
+
+ @Override
+ public boolean hasDataLossRisk(String segmentName) {
+ // If peer-download is disabled, no data loss risk exists
+ if (!_isPeerDownloadEnabled) {
+ return false;
+ }
+
+ SegmentZKMetadata segmentZKMetadata = ZKMetadataProvider
+ .getSegmentZKMetadata(_helixManager.getHelixPropertyStore(),
_tableNameWithType, segmentName);
+ if (segmentZKMetadata == null) {
+ return false;
+ }
+
+ // If the segment state is COMPLETED and the peer download URL is empty,
there is a data loss risk
+ if (!segmentZKMetadata.getStatus().isCompleted()) {
+ return
CommonConstants.Segment.METADATA_URI_FOR_PEER_DOWNLOAD.equals(segmentZKMetadata.getDownloadUrl());
+ }
+
+ // If the segment is not yet completed, then the following scenarios are
possible:
+ // - Non-upsert / non-dedup table: data loss scenarios are not possible.
Either the segment will restart
+ // consumption or the RealtimeSegmentValidationManager will kick in to
fix up the segment if pauseless is
+ // enabled
+ // - Upsert / dedup table: For non-pauseless tables, it is safe to move
the segment without data loss concerns.
+ // For pauseless tables, if the segment is still in CONSUMING state,
moving it is safe, but if it is in
+ // COMMITTING state then there is a risk of data loss on segment build
failures as well since the
+ // RealtimeSegmentValidationManager does not automatically try to fix
up these segments. To be safe it is best
+ // to return that there is a risk of data loss in case of race
conditions for pauseless enabled tables
+ // (rebalance updates IS at the same time as segment commit protocol
starts and moves it to COMMITTING)
+ return _isUpsertOrDedupTable && _isPauselessEnabled;
Review Comment:
@noob-se7en should this check only be done for segments in COMMITTING state?
right now it does it for all, but I guess even if we ask users to forceCommit
and pause ingestion, we can have some segments that are consuming, right? and
it's safe to move consuming segments
I had thought that by the time IS is updated it is possible the segment has
moved to COMMITING so that's why hadn't checked for that here, but maybe
that'll cause more issues where it always lands up failing rebalance for
downtime?
edit: updated this to also check for COMMITTING state so we allow moving
CONSUMING segments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]