rohangarg commented on PR #13369:
URL: https://github.com/apache/druid/pull/13369#issuecomment-1321552689

   One note : While I thought that batching was probably the way to go earlier, 
probably it could be re-considered in favour of caching and combining it with 
more parallelism if that's easier. The current batching idea also caches data 
in memory for pendingSegments table and updates the in-memory state for all 
requests in a batch to resolve their partition-num assignment conflict.
   In a local experiment, I was trying the following : 
   1. Add striped locking for `(datasource, interval, sequence name)` tuple in 
`allocatePendingSegment` method
   2. The `createNewSegment` method is the critical section which requires a 
global lock, but it operates on cached results of `pendingSegments` table. The 
cache could be for key `(datasource, interval)` and gets invalidated with time, 
access-count or some transaction failure while committing to pendingSegments 
table
   3. The write to pendingSegments table is also done in the striped lock
   4. More things could be extracted from `createNewSegment` (like fetching 
used segments) to the striped locking flow so that there's more parallelism
   Generally, the idea was to do DB calls in parallel threads and use cached 
objects in the global lock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to