[GitHub] [druid] jihoonson commented on a change in pull request #11398: Make SegmentLoader extensible and customizable

GitBox Tue, 13 Jul 2021 11:44:28 -0700


jihoonson commented on a change in pull request #11398:
URL: https://github.com/apache/druid/pull/11398#discussion_r668987030




##########
File path: 
server/src/main/java/org/apache/druid/segment/loading/SegmentLoader.java
##########
@@ -29,10 +30,45 @@
  * Loading segments from deep storage to local storage.
  * Implementations must be thread-safe.
  */
+@UnstableApi
 public interface SegmentLoader
 {
   boolean isSegmentLoaded(DataSegment segment);
-  Segment getSegment(DataSegment segment, boolean lazy, 
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
+  /**
+   * Returns a {@link ReferenceCountingSegment} that will be added by the 
{@link org.apache.druid.server.SegmentManager}
+   * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This 
method can be called multiple times
+   * by the {@link org.apache.druid.server.SegmentManager}
+   *
+   * Returning a {@code ReferenceCountingSegment} will let custom 
implementations keep track of reference count for
+   * segments that the custom implementations are creating. That way, custom 
implementations can know when the segment
+   * is in use or not.
+   * @param segment - {@link DataSegment} segment to download
+   * @param lazy - whether the loading is lazy
+   * @param loadFailed - Callback to invoke if the loading fails
+   * @return Segment object wrapped inside {@link ReferenceCountingSegment}.
+   * @throws SegmentLoadingException
+   */
+  ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, 
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
   File getSegmentFiles(DataSegment segment) throws SegmentLoadingException;

Review comment:
       I understand that you need a layer for wrapping segments with 
`ReferenceCountingSegment` that is extendable and customizable. But would it be 
cleaner to separate the segment loader and reference counter wrapper? Having 
two methods in the same interface that one returns `ReferenceCountingSegment` 
and another returns the underlying segment file without reference seems strange 
to me. AFAIT, this method is used only in ingestion. Perhaps we can add another 
interface that wraps `SegmentLoader` and returns `ReferenceCountingSegment` for 
query while ingestion still uses `SegmentLoader`.

##########
File path: 
server/src/main/java/org/apache/druid/segment/loading/SegmentLoader.java
##########
@@ -29,10 +30,45 @@
  * Loading segments from deep storage to local storage.
  * Implementations must be thread-safe.
  */
+@UnstableApi
 public interface SegmentLoader
 {
   boolean isSegmentLoaded(DataSegment segment);
-  Segment getSegment(DataSegment segment, boolean lazy, 
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
+  /**
+   * Returns a {@link ReferenceCountingSegment} that will be added by the 
{@link org.apache.druid.server.SegmentManager}
+   * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This 
method can be called multiple times
+   * by the {@link org.apache.druid.server.SegmentManager}
+   *
+   * Returning a {@code ReferenceCountingSegment} will let custom 
implementations keep track of reference count for
+   * segments that the custom implementations are creating. That way, custom 
implementations can know when the segment
+   * is in use or not.
+   * @param segment - {@link DataSegment} segment to download
+   * @param lazy - whether the loading is lazy
+   * @param loadFailed - Callback to invoke if the loading fails
+   * @return Segment object wrapped inside {@link ReferenceCountingSegment}.
+   * @throws SegmentLoadingException
+   */
+  ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, 
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
   File getSegmentFiles(DataSegment segment) throws SegmentLoadingException;
+
+  /**
+   * Tries to reserve the space for a segment on any location. We only return 
a boolean result instead of a pointer to
+   * {@link StorageLocation} since we don't want callers to operate on {@code 
StorageLocation} directly outside {@code SegmentLoader}.
+   * {@link SegmentLoader} operates on the {@code StorageLocation} objects in 
a thread-safe manner.
+   *
+   * @param segment - Segment to reserve
+   * @return True if enough space found to store the segment, false otherwise
+   */
+  boolean reserve(DataSegment segment);

Review comment:
       The javadoc of this and `release` methods seems a bit strange to me 
because it talks about the `StorageLocation` interface which makes me think 
`SegmentLoader` and `StorageLocation` are tied. Perhaps it should not mention 
`StorageLocation`, but should make it clear what is expected for 
implementations, such as `getSegment` must succeed if `reserve()` returns true.
   
   BTW, how do you expect these methods to be used? When should the caller use 
this method instead of `getSegment()`?

##########
File path: 
server/src/main/java/org/apache/druid/segment/loading/StorageLocation.java
##########
@@ -154,7 +166,8 @@ public synchronized File reserve(String 
segmentFilePathToAdd, String segmentId,
   {
     final File segmentFileToAdd = new File(path, segmentFilePathToAdd);
     if (files.contains(segmentFileToAdd)) {
-      return null;
+      //TODO: is this change ok?
+      return segmentFileToAdd;

Review comment:
       Your reasoning makes sense to me, but the callers currently depend on 
this behavior to avoid duplicate work. For example, [this 
line](https://github.com/apache/druid/pull/11398/files#diff-bf96197b1a1e78ae0c5ccc6e1e0757eccbabe9b0b5c91035a438462440443580R286)
 checks whether `storageDir` is null before downloading the segment. I think 
this method should return something that is indicative of whether the call 
reserves some space successfully or the call returns immediately at this line. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #11398: Make SegmentLoader extensible and customizable

Reply via email to