rohangarg commented on code in PR #14616:
URL: https://github.com/apache/druid/pull/14616#discussion_r1273262789


##########
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java:
##########
@@ -90,19 +93,46 @@ public class SegmentLoadDropHandler implements 
DataSegmentChangeHandler
   private final SegmentManager segmentManager;
   private final ScheduledExecutorService exec;
   private final ServerTypeConfig serverTypeConfig;
-  private final ConcurrentSkipListSet<DataSegment> segmentsToDelete;
+  private final ConcurrentSkipListSet<DataSegment> segmentsToDrop;
   private final SegmentCacheManager segmentCacheManager;
 
   private volatile boolean started = false;
 
-  // Keep history of load/drop request status in a LRU cache to maintain 
idempotency if same request shows up
-  // again and to return status of a completed request. Maximum size of this 
cache must be significantly greater
-  // than number of pending load/drop requests. so that history is not lost 
too quickly.
-  private final Cache<DataSegmentChangeRequest, AtomicReference<Status>> 
requestStatuses;
+  /**
+   * Used to cache the status of a completed load or drop request until it has
+   * been served to the (Coordinator) client exactly once.
+   * <p>
+   * The cache is used as follows:
+   * <ol>
+   * <li>An entry with state PENDING is added to the cache upon receiving a
+   * request to load or drop a segment.</li>
+   * <li>A duplicate request received at this point is immediately answered 
with PENDING.</li>
+   * <li>Once the load/drop finishes, the entry is updated to either SUCCESS 
or FAILED.</li>
+   * <li>A duplicate request received at this point is immediately answered 
with
+   * SUCCESS or FAILED and the entry is removed from the cache.</li>
+   * <li>If the first request itself finishes after the load or drop has 
already
+   * completed, it is answered with a SUCCESS or FAILED and the entry is 
removed
+   * from the cache.</li>
+   * </ol>
+   * <p>
+   * Maximum size of this cache must be significantly greater than the number 
of

Review Comment:
   I would vote for removing the cache structure - as discussed internally as 
well, if we are relying on HTTP status codes for correctness I don't think 
relying on a cache structure on server side is fully reliable.
   Another easy solution when a defined amount of queued request is reached 
could be to return some retrying HTTP response to the call so that if needed, 
the load queue peon can retry that request after sometime.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to