[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

GitBox Fri, 29 Oct 2021 14:28:16 -0700


wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r739540220




##########
File path: 
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, 
final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), 
loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Hi Parag, thanks for the comment! To clarify, the point of the PR is to 
provide best effort precache of segments but not 100% guaranteed precache of 
segments. The reason is we don't want to sacrifice availability.
   
   Image we have a data source with one replica configured and the host serving 
it dies, then we want the missing segments to be available ASAP on other 
replacement host. This is the case of loading segments into a complete new host 
with its page cache empty of the segments to load. Copying segments to null 
stream can take some time and if you have a lot segments, it can easily take 
more than 10 mins to complete the full read into page cache which is too long 
period of unavailability. In production, we tried the synchronous loading 
before announcing the segment, and it was indeed too slow. Another case is if 
we just restart the Druid historical process on the same host, then the process 
of reading segments into null stream is pretty fast as the segments are already 
cached in page cache by OS but it will still take some time compared with 
announcing segments directly after download.
   
   That said, the strategy of announce segment immediately after download and 
asynchronously read into null stream afterwards still have value: without the 
change, since OS will only mmap the portion of segments a query needs, even 
after days of serving a segment in a historical, a different query hitting a 
different portion of segment will still need to trigger disk reads. With the 
change, we can ensure after the first 10 mins or so after downloading the 
segments into a historical (depending on the number of segments), all 
subsequent queries won't hit disk.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Reply via email to