suneet-s commented on code in PR #14831:
URL: https://github.com/apache/druid/pull/14831#discussion_r1296195438


##########
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java:
##########
@@ -624,6 +630,27 @@ CompactSegments 
initializeCompactSegmentsDuty(CompactionSegmentSearchPolicy comp
     }
   }
 
+  @Nullable
+  @VisibleForTesting
+  KillUnusedSegments initializeKillUnusedSegmentsDuty()
+  {
+    List<KillUnusedSegments> killUnusedSegmentsDutyFromCustomGroups = 
getKillUnusedSegmentsDutyFromCustomGroups();
+    if (!config.getAutoKillEnabled()) {
+      return null;
+    }

Review Comment:
   nit:
   
   ```suggestion
       if (!config.getAutoKillEnabled()) {
         return null;
       }
       List<KillUnusedSegments> killUnusedSegmentsDutyFromCustomGroups = 
getKillUnusedSegmentsDutyFromCustomGroups();
   ```



##########
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java:
##########
@@ -43,9 +45,16 @@ public abstract class DruidCoordinatorConfig
   @Default("PT1H")
   public abstract Duration getCoordinatorMetadataStoreManagementPeriod();
 
+  @Config("druid.coordinator.kill.on")
+  @Default("false")
+  public abstract Boolean getAutoKillEnabled();
+
+  @Nullable
   @Config("druid.coordinator.kill.period")
-  @Default("P1D")
-  public abstract Duration getCoordinatorKillPeriod();
+  public Duration getCoordinatorKillPeriod()
+  {
+    return null;

Review Comment:
   ```suggestion
       return getCoordinatorIndexingPeriod();
   ```
   
   This should simplify some of the null handling added elsewhere and makes it 
clear that the default is the indexing period



##########
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java:
##########
@@ -43,9 +45,16 @@ public abstract class DruidCoordinatorConfig
   @Default("PT1H")
   public abstract Duration getCoordinatorMetadataStoreManagementPeriod();
 
+  @Config("druid.coordinator.kill.on")
+  @Default("false")
+  public abstract Boolean getAutoKillEnabled();
+
+  @Nullable

Review Comment:
   ```suggestion
   ```



##########
server/src/main/java/org/apache/druid/metadata/SegmentsMetadataManager.java:
##########
@@ -144,6 +144,12 @@ Optional<Iterable<DataSegment>> 
iterateAllUsedNonOvershadowedSegmentsForDatasour
    */
   List<Interval> getUnusedSegmentIntervals(String dataSource, DateTime 
maxEndTime, int limit);

Review Comment:
   Is there a reason not to get rid of this function from the interface? It 
looks like there are no callers other than the KillUnusedSegmentsDuty



##########
docs/configuration/index.md:
##########
@@ -853,7 +853,7 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 |`druid.coordinator.load.timeout`|The timeout duration for when the 
Coordinator assigns a segment to a Historical process.|PT15M|
 |`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not 
the Coordinator clean up old entries in the `pendingSegments` table of metadata 
store. If set to true, Coordinator will check the created time of most recently 
complete task. If it doesn't exist, it finds the created time of the earliest 
running/pending/waiting tasks. Once the created time is found, then for all 
dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic 
configuration](#dynamic-configuration)), Coordinator will ask the Overlord to 
clean up the entries 1 day or more older than the found created time in the 
`pendingSegments` table. This will be done periodically based on 
`druid.coordinator.period.indexingPeriod` specified.|true|
 |`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator 
should submit kill task for unused segments, that is, permanently delete them 
from metadata store and deep storage. If set to true, then for all whitelisted 
dataSources (or optionally all), Coordinator will submit tasks periodically 
based on `period` specified. A whitelist can be set via dynamic configuration 
`killDataSourceWhitelist` described later.<br /><br />When 
`druid.coordinator.kill.on` is true, segments are eligible for permanent 
deletion once their data intervals are older than 
`druid.coordinator.kill.durationToRetain` relative to the current time. If a 
segment's data interval is older than this threshold at the time it is marked 
unused, it is eligible for permanent deletion immediately after being marked 
unused.|false|
-|`druid.coordinator.kill.period`|How often to send kill tasks to the indexing 
service. Value must be greater than `druid.coordinator.period.indexingPeriod`. 
Only applies if kill is turned on.|P1D (1 Day)|
+|`druid.coordinator.kill.period`| How often to send kill tasks to the indexing 
service. Value must be greater `druid.coordinator.period.indexingPeriod` if 
set. Only applies if kill is turned on.                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                    | NULL (indexing period is used) |

Review Comment:
   This change should be release noted. Can you update the PR description to 
make sure it gets picked up in the release notes please.



##########
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java:
##########
@@ -624,6 +630,27 @@ CompactSegments 
initializeCompactSegmentsDuty(CompactionSegmentSearchPolicy comp
     }
   }
 
+  @Nullable
+  @VisibleForTesting
+  KillUnusedSegments initializeKillUnusedSegmentsDuty()
+  {
+    List<KillUnusedSegments> killUnusedSegmentsDutyFromCustomGroups = 
getKillUnusedSegmentsDutyFromCustomGroups();
+    if (!config.getAutoKillEnabled()) {
+      return null;
+    }
+    if (killUnusedSegmentsDutyFromCustomGroups.isEmpty()) {
+      return new KillUnusedSegments(segmentsMetadataManager, overlordClient, 
config);
+    } else {
+      if (killUnusedSegmentsDutyFromCustomGroups.size() > 1) {
+        log.warn(

Review Comment:
   Can you throw an exception here instead. This is a misconfiguration, and it 
would be better to fail startup so the operator can fix the config issue.
   
   In the exception message can you add the names of the custom groups where 
the duty is configured to make it easy for the operator to fix the config issue.



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -63,25 +67,32 @@ public class KillUnusedSegments implements CoordinatorDuty
                 && (KILL_TASK_TYPE.equals(status.getType()) && 
status.getId().startsWith(TASK_ID_PREFIX));
   private static final Logger log = new Logger(KillUnusedSegments.class);
 
-  private final long period;
+  @Nullable private final Long period;
   private final long retainDuration;
   private final boolean ignoreRetainDuration;
   private final int maxSegmentsToKill;
+
+  /**
+   * Used to keep track of the last interval end time that was killed for each
+   * datasource.
+   */
+  private final Map<String, DateTime> datasourceToLastKillIntervalEnd;

Review Comment:
   I'm not sure if this approach is the best way. If the leader changes, the 
in-memory map is essentially wiped out when a new leader comes online, so it is 
possible to re-schedule an interval that is already in the process of being 
killed. So 2 questions on this
   
   1. What is the impact if the same interval is scheduled to be killed twice? 
Can we test / validate that nothing bad happens? If the task just waits on a 
lock and then succeeds as there is no data to delete, I think that is 
reasonable.
   2. Is it possible to get the list of intervals that are being killed from 
the kill task payloads? Would that put a lot of stress on the master nodes if 
we polled the task payloads?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to