clintropolis commented on code in PR #18176:
URL: https://github.com/apache/druid/pull/18176#discussion_r2274219913


##########
server/src/main/java/org/apache/druid/segment/loading/SegmentCacheManager.java:
##########
@@ -39,116 +43,96 @@ public interface SegmentCacheManager
   boolean canHandleSegments();
 
   /**
-   * Return a list of cached segments from local disk, if any. This should be 
called only
-   * when {@link #canHandleSegments()} is true.
+   * Return a list of cached segments from local disk, if any. This should be 
called only when
+   * {@link #canHandleSegments()} is true.
    */
   List<DataSegment> getCachedSegments() throws IOException;
 
   /**
-   * Store a segment info file for the supplied segment on disk. This 
operation is idempotent when called
-   * multiple times for a given segment.
+   * Store a segment info file for the supplied segment on disk. This 
operation is idempotent when called multiple
+   * times for a given segment.
    */
   void storeInfoFile(DataSegment segment) throws IOException;
 
   /**
-   * Remove the segment info file for the supplied segment from disk. If the 
file cannot be
-   * deleted, do nothing.
+   * Remove the segment info file for the supplied segment from disk. If the 
file cannot be deleted, do nothing.
    *
-   * @see SegmentCacheManager#cleanup(DataSegment)
+   * @see SegmentCacheManager#drop(DataSegment)
    */
   void removeInfoFile(DataSegment segment);
 
+
   /**
-   * Returns a {@link ReferenceCountedSegmentProvider} that will be added by 
the {@link org.apache.druid.server.SegmentManager}
-   * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This 
method can be called multiple times
-   * by the {@link org.apache.druid.server.SegmentManager} and implementation 
can either return same {@link ReferenceCountedSegmentProvider}
-   * or a different {@link ReferenceCountedSegmentProvider}. Caller should not 
assume any particular behavior.
-   * <p>
-   * Returning a {@code ReferenceCountingSegment} will let custom 
implementations keep track of reference count for
-   * segments that the custom implementations are creating. That way, custom 
implementations can know when the segment
-   * is in use or not.
-   * </p>
-   * @param segment Segment to get on each download after service bootstrap
-   * @throws SegmentLoadingException If there is an error in loading the 
segment
-   * @see SegmentCacheManager#getBootstrapSegment(DataSegment, 
SegmentLazyLoadFailCallback)
+   * Given a {@link DataSegment}, which contains the instructions for where 
and how to fetch a {@link Segment} from
+   * deep storage, this method tries to load and subsequently serve it to 
callers via
+   * {@link #acquireSegment(DataSegment)} or {@link 
#acquireSegment(DataSegment, SegmentDescriptor)}. If the segment
+   * cannot be loaded either due to error or insufficient storage space, this 
method throws a
+   * {@link SegmentLoadingException}.
+   *
+   * @param segment Segment to get on each download (after service bootstrap)
+   * @throws SegmentLoadingException If there is an error in loading the 
segment or insufficient storage space
+   * @see SegmentCacheManager#bootstrap(DataSegment, 
SegmentLazyLoadFailCallback)
    */
-  ReferenceCountedSegmentProvider getSegment(DataSegment segment) throws 
SegmentLoadingException;
+  void load(DataSegment segment) throws SegmentLoadingException;
 
   /**
-   * Similar to {@link #getSegment(DataSegment)}, this method returns a {@link 
ReferenceCountedSegmentProvider} that will be
-   * added by the {@link org.apache.druid.server.SegmentManager} to the {@link 
org.apache.druid.timeline.VersionedIntervalTimeline}
-   * during startup on data nodes.
-   * @param segment Segment to retrieve during service bootstrap
+   * Similar to {@link #load(DataSegment)}, this method loads segments during 
startup on data nodes. Implementations of
+   * this method may be configured to use a larger than normal work pool that 
only exists during startup and is shutdown
+   * after startup by calling {@link #shutdownBootstrap()}
+   *
+   * @param segment    Segment to retrieve during service bootstrap
    * @param loadFailed Callback to execute when segment lazy load failed. This 
applies only when
    *                   {@code lazyLoadOnStart} is enabled
-   * @throws SegmentLoadingException - If there is an error in loading the 
segment
-   * @see SegmentCacheManager#getSegment(DataSegment)
+   * @throws SegmentLoadingException - If there is an error in loading the 
segment or insufficient storage space
+   * @see SegmentCacheManager#load(DataSegment)
+   * @see SegmentCacheManager#shutdownBootstrap()
    */
-  ReferenceCountedSegmentProvider getBootstrapSegment(
-      DataSegment segment,
-      SegmentLazyLoadFailCallback loadFailed
-  ) throws SegmentLoadingException;
+  void bootstrap(DataSegment segment, SegmentLazyLoadFailCallback loadFailed) 
throws SegmentLoadingException;
 
   /**
-   * This method fetches the files for the given segment if the segment is not 
downloaded already. It
-   * is not required to {@link #reserve(DataSegment)} before calling this 
method. If caller has not reserved
-   * the space explicitly via {@link #reserve(DataSegment)}, the 
implementation should reserve space on caller's
-   * behalf.
-   * If the space has been explicitly reserved already
-   *    - implementation should use only the reserved space to store segment 
files.
-   *    - implementation should not release the location in case of download 
erros and leave it to the caller.
-   * @throws SegmentLoadingException if there is an error in downloading files
-   */
-  File getSegmentFiles(DataSegment segment) throws SegmentLoadingException;
-
-  /**
-   * Asynchronously load the supplied segment into the page cache on each 
download after the service finishes bootstrapping.
-   * Equivalent to `cat segment_files > /dev/null` to force loading the 
segment index files into page cache so that
-   * later when the segment is queried, they are already in page cache and 
only a minor page fault needs to be triggered
-   * instead of a major page fault to make the query latency more consistent.
+   * Cleanup the segment files cache space used by the segment, releasing the 
{@link StorageLocation} reservation
    *
-   * @see SegmentCacheManager#loadSegmentIntoPageCacheOnBootstrap(DataSegment)
+   * @see SegmentCacheManager#removeInfoFile(DataSegment)
    */
-  void loadSegmentIntoPageCache(DataSegment segment);
+  void drop(DataSegment segment);
 
   /**
-   * Similar to {@link #loadSegmentIntoPageCache(DataSegment)}, but 
asynchronously load the supplied segment into the
-   * page cache during service bootstrap.
-   *
-   * @see SegmentCacheManager#loadSegmentIntoPageCache(DataSegment)
+   * Applies a {@link SegmentMapFunction} to a {@link Segment} if it is 
available in the cache. If not present in any
+   * storage location, this method will not attempt to download the {@link 
DataSegment} from deep storage. The
+   * {@link Segment} returned by this method is considered an open reference, 
cache implementations must not allow it
+   * to be dropped until it has been closed. As such, the returned {@link 
Segment} must be closed when the caller is
+   * finished doing segment things.
    */
-  void loadSegmentIntoPageCacheOnBootstrap(DataSegment segment);
+  Optional<Segment> acquireSegment(DataSegment dataSegment);
 
   /**
-   * Shutdown any previously set up bootstrap executor to save resources.
-   * This should be called after loading bootstrap segments into the page 
cache.
+   * Returns a {@link AcquireSegmentAction} for a given {@link DataSegment} 
and {@link SegmentDescriptor}, which returns
+   * the {@link Segment} if already present in the cache, or tries to fetch 
from deep storage and map if not. The
+   * {@link Segment} returned by this method is considered an open reference, 
cache implementations must not allow it
+   * to be dropped until it has been closed. As such, the returned {@link 
Segment} must be closed when the caller is
+   * finished doing segment things.
    */
-  void shutdownBootstrap();
-
-  boolean reserve(DataSegment segment);
+  AcquireSegmentAction acquireSegment(
+      DataSegment dataSegment,
+      SegmentDescriptor descriptor

Review Comment:
   not sure of a really clean way to get rid of the descriptors, the issue is 
that the ‘missing segment’ retry stuff works on descriptors (so it can request 
again with the missing descriptors). the only way i can think to do it is for 
caller to keep a map of `DataSegment` to  list of `SegmentDescriptor` - i 
believe it needs to be a list because of wonky overshadowing stuff with mixed 
segment granularities, so that `AcquireSegmentAction` has a `DataSegment` 
instead of a descriptor. I'm not sure it feels any better to do this instead, 
and i think we would want to have some validation checks to make sure that we 
actually have the same number of missings as descriptors we are reporting (i 
don't think it would really be possible for only some of the descriptors 
associated with a segment to be missing at least if we are grabbing them all at 
once)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to