gianm commented on code in PR #19535:
URL: https://github.com/apache/druid/pull/19535#discussion_r3438128234


##########
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java:
##########
@@ -105,6 +105,15 @@ public class SegmentLoaderConfig
   @JsonProperty("virtualStorageMetadataReservationEstimate")
   private long virtualStorageMetadataReservationEstimate = 16L * 1024L * 1024L;
 
+  /**
+   * When true, partial-eligible V10 segments are mounted via the partial 
machinery and
+   * {@link SegmentCacheManager#acquireSegment} with {@link 
AcquireMode#PARTIAL} returns a metadata-anchored segment
+   * whose columns are downloaded on demand. When false (the default), {@link 
AcquireMode#PARTIAL} falls back to
+   * {@link AcquireMode#FULL} so the entire segment is downloaded up front 
(matching pre-partial-download behavior).
+   */
+  @JsonProperty("virtualStoragePartialDownloadsEnabled")
+  private boolean virtualStoragePartialDownloadsEnabled = false;

Review Comment:
   IMO it would be better to have this always enabled at the VSF layer, and to 
have the toggle be in the query context. There can be something like 
`QueryContexts#getAcquireMode`. (Something like `usePartialDownloads: 
true/false`). I like this approach because it makes testing much easier, and 
also it makes the code in the VSF layer easier to follow because it gets rid of 
conditionals. There would also be no reason to get rid of partially-downloaded 
directories during bootstrapping, because they're always considered OK.
   
   Default should be `usePartialDownloads: false` for now.



##########
multi-stage-query/src/main/java/org/apache/druid/msq/querykit/ReadableInputQueue.java:
##########
@@ -288,7 +298,7 @@ private ListenableFuture<ReadableInput> loadNextSegment()
         return null;
       }
 
-      final AcquireSegmentAction acquireSegmentAction = 
nextLoadableSegment.acquire();
+      final AcquireSegmentAction acquireSegmentAction = 
nextLoadableSegment.acquire(acquireMode);

Review Comment:
   It's been clear in testing that good overall performance on cold data is 
highly dependent on loadahead. With this change, we aren't fetching the entire 
segment as part of loadahead, which is going to lead to poorer download rates. 
It's ok to deal with this in a follow-up, but it does need to be dealt with 
somehow.
   
   Either the query logic should declare what parts of the segment it wants to 
start with, or we should do something statistical: for each query, observe 
which parts of the segment are actually used, and fetch those during loadahead 
for future segments. The statistical approach only works for queries that fetch 
a large number of segments (so the earlier ones can be used to inform the later 
ones). But this is fine, since it's those queries where the performance concern 
lies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to