oathdruid commented on issue #2762: URL: https://github.com/apache/brpc/issues/2762#issuecomment-2363842394
https://github.com/apache/brpc/blob/master/docs/cn/io.md https://github.com/apache/brpc/blob/master/docs/cn/execution_queue.md brpc里的MPSC一般都是这个套路实现的,比如上面这俩都是,braft应该也是广泛用了这个模式; 经典的双端链表队列实现是Michael&Scott的[non-blocking queue](https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf),boost里有一个对应的实现boost::lock_free:queue,不过入队动作依赖CAS,CAS本身在竞争比较大的时候有自旋问题其实并发能力一般; brpc这个单端链表核心就是通过不会失败不会自旋的exchange替换CAS提升了并发能力;也因为是通过和当前队尾进行exchage动作来成为新的队尾,整体的单链指向就只能是队尾指向前序节点的逆序链表了; 不过单纯看性能,基于数组的队列一般比链表类队列整体吞吐要高一些;所以这种链表类队列主要的优势点还是在于大量队列(例如作为接入层server承接大链接数)时不用预留数组长度,内存有节省;但是大链接数一般单链接上竞争就不强了,其实哪怕mutex保护的单链其实问题都不大;而有大并发的场景,因为链接数规模可控,可能进一步上数组队列的并发能力会更好(比如直接iouring之类的);甚至大链接数场景,按链接组做聚簇之后上数组队列可能均衡意义上有时也强过链表队列; 只看MPSC并发能力的话,比较久之前做过一个简易评测 producer=12 consumer=1 | qps | cpu | latency | qps | cpu | latency | qps | cpu | latency -- | -- | -- | -- | -- | -- | -- | -- | -- | -- boost::lockfree::queue | 1W | 1.008 | 0.92 | 100W | 2.608 | 87.06 | 170W | 5.34 | 314 bthread::ExecutionQueue | 1W | 0.015 | 6.03 | 100W | 0.566 | 405 | 347W | 1.53 | 1600 tbb::concurrent_bounded_queue | 1W | 0.012 | 4.57 | 100W | 0.561 | 369 | 597W | 3.37 | 2400 folly::MPMCQueue | 1W | 0.012 | 3.88 | 100W | 0.314 | 144 | 895W | 3.21 | 534 babylon::ConcurrentExecutionQueue | 1W | 0.011 | 4.86 | 100W | 0.166 | 5.65 | 1511W | 8.8 | 3.1 babylon::ConcurrentBoundedQueue | 1W | 0.010 | 3.68 | 100W | 0.258 | 5.37 | 1690W | 7.4 | 2.9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org For additional commands, e-mail: dev-h...@brpc.apache.org