kfaraz commented on code in PR #17770: URL: https://github.com/apache/druid/pull/17770#discussion_r1977751659
########## server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java: ########## @@ -273,9 +273,17 @@ private Interval findIntervalForKill( ) { final DateTime minStartTime = datasourceToLastKillIntervalEnd.get(dataSource); + + // Once the first segment from a datasource is killed, we have a valid minStartTime. + // Restricting the upper bound to scan segments metadata while running the kill task results in a efficient SQL query. + final DateTime maxIntervalFromLastKill = minStartTime != null + ? DateTimes.min(minStartTime.plus(maxIntervalToKill), + DateTimes.nowUtc().minus(durationToRetain)) + : DateTimes.nowUtc().minus(durationToRetain); + Review Comment: It's probably cleaner to just initialize the `maxEndTime` directly like so: ```suggestion final DateTime maxEndTime; if (ignoreDurationToRetain) { maxEndTime = DateTimes.COMPARE_DATE_AS_STRING_MAX; } else if (minStartTime == null) { maxEndTime = DateTimes.nowUtc().minus(durationToRetain); } else { // If we have already killed a segment, limit the kill interval based on the minStartTime maxEndTime = DateTimes.min( DateTimes.nowUtc().minus(durationToRetain), minStartTime.plus(maxIntervalToKill) ); } = minStartTime != null ? DateTimes.min(minStartTime.plus(maxIntervalToKill), DateTimes.nowUtc().minus(durationToRetain)) : DateTimes.nowUtc().minus(durationToRetain); ``` ########## server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java: ########## @@ -273,9 +273,17 @@ private Interval findIntervalForKill( ) { final DateTime minStartTime = datasourceToLastKillIntervalEnd.get(dataSource); + + // Once the first segment from a datasource is killed, we have a valid minStartTime. + // Restricting the upper bound to scan segments metadata while running the kill task results in a efficient SQL query. + final DateTime maxIntervalFromLastKill = minStartTime != null + ? DateTimes.min(minStartTime.plus(maxIntervalToKill), + DateTimes.nowUtc().minus(durationToRetain)) + : DateTimes.nowUtc().minus(durationToRetain); + final DateTime maxEndTime = ignoreDurationToRetain Review Comment: You may remove this assignment if you use the code suggestion above. ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill() validateLastKillStateAndReset(DS1, YEAR_OLD); } + @Test + public void testRestrictKillQueryToMaxInterval() + { + configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration()) + .withMaxIntervalToKill(MAX_KILL_INTERVAL); + + initDuty(); + + createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29)); Review Comment: ```suggestion // For a new datasource, the duration to retain is used to determine kill interval createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29)); ``` ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill() validateLastKillStateAndReset(DS1, YEAR_OLD); } + @Test + public void testRestrictKillQueryToMaxInterval() Review Comment: ```suggestion public void testMaxIntervalToKillOverridesDurationToRetain() ``` ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -87,6 +88,9 @@ public class KillUnusedSegmentsTest private static final String DS2 = "DS2"; private static final String DS3 = "DS3"; + private static final Period RETAIN_DURATION = Period.hours(6); Review Comment: Non-blocker: Maybe also add a test where retain duration is bigger so that it leads to smaller kill interval and thus is given precedence over the `maxIntervalToKill`. ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill() validateLastKillStateAndReset(DS1, YEAR_OLD); } + @Test + public void testRestrictKillQueryToMaxInterval() + { + configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration()) + .withMaxIntervalToKill(MAX_KILL_INTERVAL); Review Comment: ```suggestion configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration()) .withMaxIntervalToKill(MAX_KILL_INTERVAL); ``` ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill() validateLastKillStateAndReset(DS1, YEAR_OLD); } + @Test + public void testRestrictKillQueryToMaxInterval() + { + configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration()) + .withMaxIntervalToKill(MAX_KILL_INTERVAL); Review Comment: Nit: The fields `RETAIN_DURATION` and `MAX_KILL_INTERVAL` probably don't need to be constants as they are only used in the new test. Not using constants here might make the test more readable. ########## server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java: ########## @@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill() validateLastKillStateAndReset(DS1, YEAR_OLD); } + @Test + public void testRestrictKillQueryToMaxInterval() + { + configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration()) + .withMaxIntervalToKill(MAX_KILL_INTERVAL); + + initDuty(); + + createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29)); + CoordinatorRunStats newDatasourceStats = runDutyAndGetStats(); + + Assert.assertEquals(1, newDatasourceStats.get(Stats.Kill.ELIGIBLE_UNUSED_SEGMENTS, DS1_STAT_KEY)); + validateLastKillStateAndReset(DS1, MONTH_OLD); + + createAndAddUnusedSegment(DS1, FIFTEEN_DAY_OLD, VERSION, NOW.minusDays(14)); Review Comment: ```suggestion // For a datasource where kill has already happened, maxIntervalToKill is used // if it leads to a smaller kill interval than durationToRetain createAndAddUnusedSegment(DS1, FIFTEEN_DAY_OLD, VERSION, NOW.minusDays(14)); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org