Hi Nagadheeraj, I am no expert in crypto dev, maybe you can educate me if I am wrong: I got an impression in this series, the barriers were used too much, too heavily and unnecessarily.
For enqueue operations, I understand they are stores to the DMA buffer, the queue will be fetched and updated by the crypto device after processing, then dequeued by the other CPU cores. So for enqueue operations, an rte_io_wmb is required before the doorbell ringing, and an rte_smp_wmb is required to ensure the enqueue operations were done before the consumer on the other side(who dequeues) sees the updated pending_count. For dequeue operations, rte_smp_rmb is required after reading the pending_count to ensure reading the intact content from the queue(if the queue entries were not handled yet by the crypto dev, the status will show that, maybe an rte_io_rmb is required to ensure the status is read out first). The rte_smp_xmb can even be optimized with C11 atomics, but it can be next step. Best Regards, Gavin