I've been struggling with the fridge threads. For NFS/RDMA, once we are running a work thread, we really don't want to hand off to another work thread during processing in thr_decode_rpc_request() -- adds considerable latency.
Panasas also reports latency and stalls in VFS::PanFS. Currently, the fridge starts processing some work, then stops and re-queues a work request nfs_rpc_enqueue_req(nfsreq) at the end of thr_decode_rpc_request(). By the comments in nfs_rpc_dispatcher_thread.c: * Next, the preferred dispatch thread should be, I speculate, one * which has (most) recently handled a request for this xprt. ("I" isn't identified.) So there is extra complication in nfs_rpc_getreq_ng() * calling fridgethr_submit(req_fridge, thr_decode_rpc_requests, xprt), ** which in turn runs thr_decode_rpc_requests() *** to loop on any multiple requests per xprt, **** handing each to a separate worker. All to attempt "locality of reference" for a worker. This is particularly bad for RDMA, as serializing multiple requests this way means the most buffers have to be held outstanding at a time! Perhaps a simpler and more efficient design would borrow from Internet Routing: Weighted Fair Queuing. We could more easily insert jobs into a weighted array of queues, and then the thread keeps going without any handoff until done (or another wait for event). If after completing a req, then another req for the same xprt is found, the next req should be moved to the end of the weighted queue, so that other xprts aren't treated unfairly. Those are the two basic elements of Weighted Fair Queuing (WFQ). ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel