jihoonson commented on a change in pull request #11398:
URL: https://github.com/apache/druid/pull/11398#discussion_r668987030
##########
File path:
server/src/main/java/org/apache/druid/segment/loading/SegmentLoader.java
##########
@@ -29,10 +30,45 @@
* Loading segments from deep storage to local storage.
* Implementations must be thread-safe.
*/
+@UnstableApi
public interface SegmentLoader
{
boolean isSegmentLoaded(DataSegment segment);
- Segment getSegment(DataSegment segment, boolean lazy,
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
+ /**
+ * Returns a {@link ReferenceCountingSegment} that will be added by the
{@link org.apache.druid.server.SegmentManager}
+ * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This
method can be called multiple times
+ * by the {@link org.apache.druid.server.SegmentManager}
+ *
+ * Returning a {@code ReferenceCountingSegment} will let custom
implementations keep track of reference count for
+ * segments that the custom implementations are creating. That way, custom
implementations can know when the segment
+ * is in use or not.
+ * @param segment - {@link DataSegment} segment to download
+ * @param lazy - whether the loading is lazy
+ * @param loadFailed - Callback to invoke if the loading fails
+ * @return Segment object wrapped inside {@link ReferenceCountingSegment}.
+ * @throws SegmentLoadingException
+ */
+ ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy,
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
File getSegmentFiles(DataSegment segment) throws SegmentLoadingException;
Review comment:
I understand that you need a layer for wrapping segments with
`ReferenceCountingSegment` that is extendable and customizable. But would it be
cleaner to separate the segment loader and reference counter wrapper? Having
two methods in the same interface that one returns `ReferenceCountingSegment`
and another returns the underlying segment file without reference seems strange
to me. AFAIT, this method is used only in ingestion. Perhaps we can add another
interface that wraps `SegmentLoader` and returns `ReferenceCountingSegment` for
query while ingestion still uses `SegmentLoader`.
##########
File path:
server/src/main/java/org/apache/druid/segment/loading/SegmentLoader.java
##########
@@ -29,10 +30,45 @@
* Loading segments from deep storage to local storage.
* Implementations must be thread-safe.
*/
+@UnstableApi
public interface SegmentLoader
{
boolean isSegmentLoaded(DataSegment segment);
- Segment getSegment(DataSegment segment, boolean lazy,
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
+ /**
+ * Returns a {@link ReferenceCountingSegment} that will be added by the
{@link org.apache.druid.server.SegmentManager}
+ * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This
method can be called multiple times
+ * by the {@link org.apache.druid.server.SegmentManager}
+ *
+ * Returning a {@code ReferenceCountingSegment} will let custom
implementations keep track of reference count for
+ * segments that the custom implementations are creating. That way, custom
implementations can know when the segment
+ * is in use or not.
+ * @param segment - {@link DataSegment} segment to download
+ * @param lazy - whether the loading is lazy
+ * @param loadFailed - Callback to invoke if the loading fails
+ * @return Segment object wrapped inside {@link ReferenceCountingSegment}.
+ * @throws SegmentLoadingException
+ */
+ ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy,
SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException;
+
File getSegmentFiles(DataSegment segment) throws SegmentLoadingException;
+
+ /**
+ * Tries to reserve the space for a segment on any location. We only return
a boolean result instead of a pointer to
+ * {@link StorageLocation} since we don't want callers to operate on {@code
StorageLocation} directly outside {@code SegmentLoader}.
+ * {@link SegmentLoader} operates on the {@code StorageLocation} objects in
a thread-safe manner.
+ *
+ * @param segment - Segment to reserve
+ * @return True if enough space found to store the segment, false otherwise
+ */
+ boolean reserve(DataSegment segment);
Review comment:
The javadoc of this and `release` methods seems a bit strange to me
because it talks about the `StorageLocation` interface which makes me think
`SegmentLoader` and `StorageLocation` are tied. Perhaps it should not mention
`StorageLocation`, but should make it clear what is expected for
implementations, such as `getSegment` must succeed if `reserve()` returns true.
BTW, how do you expect these methods to be used? When should the caller use
this method instead of `getSegment()`?
##########
File path:
server/src/main/java/org/apache/druid/segment/loading/StorageLocation.java
##########
@@ -154,7 +166,8 @@ public synchronized File reserve(String
segmentFilePathToAdd, String segmentId,
{
final File segmentFileToAdd = new File(path, segmentFilePathToAdd);
if (files.contains(segmentFileToAdd)) {
- return null;
+ //TODO: is this change ok?
+ return segmentFileToAdd;
Review comment:
Your reasoning makes sense to me, but the callers currently depend on
this behavior to avoid duplicate work. For example, [this
line](https://github.com/apache/druid/pull/11398/files#diff-bf96197b1a1e78ae0c5ccc6e1e0757eccbabe9b0b5c91035a438462440443580R286)
checks whether `storageDir` is null before downloading the segment. I think
this method should return something that is indicative of whether the call
reserves some space successfully or the call returns immediately at this line.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]