alexeykudinkin commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1183666935

   @zhangyue19921010 thanks for you contribution!
   
   We indeed saw locks acquired by the queue in the  current implementation of 
the `BoundedInMemoryExecutor` show up in our profiles, so this change is 
definitely a nice improvement!
   
   That being said though, i think upside of using Disruptor is quite limited 
for Hudi due to our internal architecture: Disruptor's lock-free approach 
shines the brightest when you have _many_ Producers to _many_ consumers, in our 
case though Hudi in all cases but a few has exactly 1 producer and 1 consumer 
-- b/c we're using BIMQ only w/in a single Spark partition that is usually 
executed w/in a single Task on a single CPU core. In that sense BIMQ is just 
the way for us to balance b/w the speed of reading and the speed of writing -- 
if, say, reader is reading too fast and writing is not able to keep up, our 
memory buffers (filled by reader) might overflow leading to an OOM. BIMQ in 
that case will serve as a back-pressure mechanism slowing down the reader (with 
locks) until writer is able to catch up.
   
   Avoiding locks in that path will be able to reduce our compute footprint by 
about ~10%, but from what i've seen so far i don't thin we'd be able to get 
more than that out of it unfortunately.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to