alexeykudinkin commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1183666935
@zhangyue19921010 thanks for you contribution! We indeed saw locks acquired by the queue in the current implementation of the `BoundedInMemoryExecutor` show up in our profiles, so this change is definitely a nice improvement! That being said though, i think upside of using Disruptor is quite limited for Hudi due to our internal architecture: Disruptor's lock-free approach shines the brightest when you have _many_ Producers to _many_ consumers, in our case though Hudi in all cases but a few has exactly 1 producer and 1 consumer -- b/c we're using BIMQ only w/in a single Spark partition that is usually executed w/in a single Task on a single CPU core. In that sense BIMQ is just the way for us to balance b/w the speed of reading and the speed of writing -- if, say, reader is reading too fast and writing is not able to keep up, our memory buffers (filled by reader) might overflow leading to an OOM. BIMQ in that case will serve as a back-pressure mechanism slowing down the reader (with locks) until writer is able to catch up. Avoiding locks in that path will be able to reduce our compute footprint by about ~10%, but from what i've seen so far i don't thin we'd be able to get more than that out of it unfortunately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
