[PR] Merge hydrant runners flatly for realtime queries. (druid)

via GitHub Wed, 24 Jan 2024 23:21:52 -0800


gianm opened a new pull request, #15757:
URL: https://github.com/apache/druid/pull/15757


   Prior to this patch, we have two layers of mergeRunners for realtime 
queries: one for each Sink (a logical segment) and one across all Sinks. This 
is done because to keep metrics and results grouped by Sink, given that each 
FireHydrant within a Sink has its own separate storage adapter.
   
   However, it costs extra memory usage due to the extra layer of 
materialization. This is especially pronounced for groupBy queries, which only 
use their merge buffers at the top layer of merging. The lower layer of merging 
materializes ResultRows directly into the heap, which can cause heap exhaustion 
if there are enough ResultRows.
   
   This patch changes to a single layer of merging when bySegment: false, just 
like Historicals. To accommodate that, segment metrics like query/segment/time 
are now per-FireHydrant instead of per-Sink.
   
   Two layers of merging are retained when bySegment: true. This isn't common, 
because it's typically only used when segment level caching is enabled on the 
Broker, which is off by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Merge hydrant runners flatly for realtime queries. (druid)

Reply via email to