Re: [I] Druid query to get last N transactions from the db (druid)

via GitHub Wed, 30 Aug 2023 04:59:55 -0700


renatocron commented on issue #14914:
URL: https://github.com/apache/druid/issues/14914#issuecomment-1699028147


   Hello!
   
   2 years is potentially a lot of segments that need to be scanned, so you are 
missing one of the main advantage of having a primary index based on time, if 
you ask the database to scan all the segments all the time.
   
   There are a few options to consider:
   
   You can try turn cache on on the historical nodes (no on the broker, caching 
on the broker would not matter for this query), that could help if you don't 
have any late arriving data and reserve a some RAM for caching, both on 
historicals node RAM as well on the historical: 
https://druid.apache.org/docs/latest/configuration/#historical-caching
   
   But I'm afraid druid can't otimize `longLast`, because it may need to scan 
all the matched segments to send to the broker to compare which is the overall 
latest.
   
   You can consider make a external Service / Application:
   
   This service could be configured to read data from the same ingestion 
sources and update a separate, faster storage system (like Redis, DuckDB or an 
in-memory database).
   
   Here's a simple outline of how you could design it:
   
   - The service reads from the same ingestion source that Druid uses.
   - It updates a faster storage (like Redis).
   - The structures stored in this faster database mirror those in Druid.
   - You query the faster storage directly for the most recent transactions.
   - If this fails, you fall back to querying Druid.
   - This system would also rely on Druid as a source of truth, when your 
application boots up or if the in-memory database fails.
   This approach frees you from sending each query to Druid and allows you to 
retrieve the most recent transactions much more quickly. It does, however, 
require this additional service to be maintained.
   
   Nevertheless, you could also check if the segment compaction are optimized 
for your needs:
   
   Segments Optimization (Secondary Indexing): If your data has some natural 
order or frequently queried fields, you could specify secondary indexes using 
autocompaction with range, for example 
[https://druid.apache.org/docs/latest/ingestion/native-batch#benefits-of-range-partitioning](https://druid.apache.org/docs/latest/ingestion/native-batch#benefits-of-range-partitioning).
 Although time-based partitions are the primary indexing method in Druid, 
secondary indexes could help improve query latency by reducing the quantity of 
data that needs to be scanned, maybe you can have segments just with the 
triggerState=COMPLETED in one place, or create a new datasource just for them
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Druid query to get last N transactions from the db (druid)

Reply via email to