surekhasaharan opened a new issue #6834: [Proposal] Add published segment cache 
in broker
URL: https://github.com/apache/incubator-druid/issues/6834
 
 
   ##Problem:
   
   Some of the sys.segments queries are slow, they are taking as long as ~10-20 
sec, which is not desirable. The cause of this slowness is call from broker to 
coordinator API which happens every time a query is issued to `sys.segments` 
table, it’s the `getMetaDataSegments` (invokes coordinator api 
`/druid/coordinator/v1/metadata/segments`) method which gets called from the 
`SegmentsTable#scan()` in `SystemSchema.java`.  Coordinator can potentially 
returns millions of segments and most of the time is spent in parsing the json 
response and creating DataSegment objects. 
   
   ##Motivation:
   
   It would be useful to make these queries faster as these are used in an 
interactive way by the end user today. In future a unified druid console can be 
built on top of sys tables(#6832) and the new segment locking can also benefit 
from all used segments present in broker.
   
   ##Proposed Changes:
   
   To fix this performance bottleneck, plan to add :
   
   1. segment cache in broker (phase 1)
   2. a new api in coordinator (phase 2)
   
   ####Phase 1 
   
   To speed up the sys.segments queries, in phase1 I want to add a published 
segments cache in broker. Broker already maintains a cache of all available 
segments via the ServerView, as brokers are caching segments announced by 
historicals, but not from metadata store (published segments are cached in 
coordinator only). This cache would be updated in background and therefore 
would allow faster query response time from broker.
   Potential issue is it could lead to memory pressure on broker if the number 
of published segments is large. To minimize this memory pressure on Broker, the 
 `DataSegment` instance should be shared between the “available” segments in 
existing broker cache and “published” segments in the new segment cache. 
Roughly, for about a million segments which are published and available, the 
heap space for reference would be ~10 MB. In addition, the complete 
`DataSegment` object would be stored for segments for “published but 
unavailable” segments, which ideally should be close to 0 segments. 
   
   ####Phase 2
   
   In phase 2 for this improvement, a more efficient coordinator API should be 
added. There can be several ways to add this new coordinator API, see rejected 
alternatives for other options considered.
   
   This API returns a delta of added/removed segments and takes timestamp as 
argument. When broker comes up, it gets all the published segments from 
coordinator. Broker does following: orders the received segments by the 
timestamp (`created_date`), saves the published segment in it’s cache and keeps 
track of the last received segment’s timestamp. Subsequent calls to the 
coordinator api will only return the segments that have been added or removed 
since the last timestamp.The broker will poll the coordinator API at a regular 
interval to keep the published segment cache synced in a background thread. 
"added_ segments" delta can be computed based on the `created_date`, additional 
work would be required to compute the "deleted_segments" delta. Coordinator 
will need to maintain an in-memory list of deleted segments and will need to be 
notified when a segment gets killed external to coordinator (unless this 
behavior is changed as suggested in #6816). Since the deleted segments count 
can increase, to avoid memory pressure, coordinator can remember an hour(or 
some other configurable value) of deleted segments. In case, the requested 
timestamp is older than an hour, all the published segments can be resynced. In 
case of coordinator restart or leader change, again, it can send all the 
published segments. 
   
   ##New or Changed Public Interface :
   A new rest endpoint will be added to coordinator
   ```GET /segments/{timestamp}```
   
   Add a timestamp field to `DataSegment` object which represents the 
`created_date` from `druid_segments` table in metadata store.
   
   ##Rejected Alternatives:
   These options were also considered for the coordinator API 
   1. Coordinator sends just the `ids` of the published segments instead of 
complete `DataSegment` serialized objects, and then broker does a diff and 
finds out the segments which are not available, and then makes another call to 
get details for those segments. This approach was rejected because sometimes 
the segment_id list can be pretty large and it can cause a lot of network 
traffic between coordinator and broker processes and we may not achieve the 
performance improvement we are looking for.
   2. Add a new table “druid_transactionlogs” to the metadata store, which 
keeps track of the segment addition and removal. The coordinator API can then 
query this table when it receives a GET request from broker for any timestamp, 
it can also query this to maintain it’s own cache. For example,
   
   | operation  | segment_id | timestamp |
   | ------------- | ------------- | ------------- |
   | add  | s1  |ts_0  |
   | disable  | s2  |ts_1  |
   | delete  | s3  |ts_1  |
   
   It can use write ahead logging to take care of failures/restarts in any 
process. While this approach is good for maintaining consistency between 
coordinator and broker cache as well as fault tolerance, it may not give the 
speed improvement if we invoke the db call on each API invocation. Another 
challenge would be to keep the `druid_segments` table and 
`druid_transactionlogs` table in sync. Unless we need this for broader use 
cases, it may not be worth the extra design and effort.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to