Precisely ³credit² should be concurrent sends (of ko2iblnd message) to a single peer, it is not number of inflight Lustre RPCs. I understand the memory issue of this, and by enabling map_on_demand, ko2iblnd will create FMR for large fragments bulk IO (for example, 32+ fragments or 128K+), and only allow small IOs to use current way and avoid overhead of creating FMR, then we have up to 32 fragments and QP size is only 1/8 of now.
Regards Liang On 9/2/14, 6:09 PM, "Alexey Lyashkov" <[email protected]> wrote: >credits for Lustre ? it¹s works? now it¹s strange number without relation >to real network structure and produce over buffering issues on server >side. > >On Sep 2, 2014, at 12:22 PM, Zhen, Liang <[email protected]> wrote: > >> Yes, I think this is the potential issue of this patch, for each 1M >>data lustre has 256 fragments (256 pages) on 4K pagesize system, which >>means we can have max to (credits X 256) outstanding work requests for >>each connection, decreasing max_send_wr may hit ib_post_send() failure >>under heavy workload. >> >> I understand this may be a problem for low level stack to allocate big >>chunk of space, and cause memory allocating failures. The solution is >>enabling map_on_demand and use FMR, however, enabling this on some nodes >>will prevent them to join cluster if other nodes have no map_on_demand, >>we already have a patch for this which is pending on review, please >>check this (LU-3322) >> >> Thanks >> Liang >> >> From: David McMillen <[email protected]<mailto:[email protected]>> >> Date: Sunday, August 31, 2014 at 6:48 PM >> To: >>"[email protected]<mailto:[email protected]>" >> >><[email protected]<mailto:[email protected]>> >>, Eli Cohen <[email protected]<mailto:[email protected]>> >> Subject: Re: [Lustre-discuss] [PATCH] Avoid Lustre failure on temporary >>failure >> >> Has this been tested with a significant I/O load? We had tried a >>similar approach but ran into subsequent errors and connection drops >>when the ib_post_send() failed. The code assumes that the original >>init_qp_attr->cap.max_send_wr value succeeded. Is there a second part >>to this patch? >> >> Dave >> >> On Sun, Aug 31, 2014 at 2:53 AM, Eli Cohen >><[email protected]<mailto:[email protected]>> wrote: >> >>> Lustre code tries to create a QP with max_send_wr which depends on a >>>module >>> parameter. The device capabilities do provide the maximum number of >>>send work >>> requests that the device supports but the actual number of work >>>requests that >>> can be supported in a specific case depends on other characteristics >>>of the >>> work queue, the transport type, etc. This is in compliance with the IB >>>spec: >>> >>> 11.2.1.2 QUERY HCA >>> Description: >>> Returns the attributes for the specified HCA. >>> The maximum values defined in this section are guaranteed >>> not-to-exceed values. It is possible for an implementation to allocate >>> some HCA resources from the same space. In that case, the maximum >>> values returned are not guaranteed for all of those resources >>> simultaneously. >>> >>> This patch tries to decrease the number of requested work requests to >>>a level >>> that can be supported by the HCA. This prevents unnecessary failures. >>> >>> Signed-off-by: Eli Cohen <eli at mellanox.com> >>> --- >>> lnet/klnds/o2iblnd/o2iblnd.c | 25 ++++++++++++++++++------- >>> 1 file changed, 18 insertions(+), 7 deletions(-) >>> >>> diff --git a/lnet/klnds/o2iblnd/o2iblnd.c >>>b/lnet/klnds/o2iblnd/o2iblnd.c >>> index 4061db00cba2..ef1c6e07cb45 100644 >>> --- a/lnet/klnds/o2iblnd/o2iblnd.c >>> +++ b/lnet/klnds/o2iblnd/o2iblnd.c >>> @@ -736,6 +736,7 @@ kiblnd_create_conn(kib_peer_t *peer, struct >>>rdma_cm_id *cmid, >>> int cpt; >>> int rc; >>> int i; >>> + int orig_wr; >>> >>> LASSERT(net != NULL); >>> LASSERT(!in_interrupt()); >>> @@ -862,13 +863,23 @@ kiblnd_create_conn(kib_peer_t *peer, struct >>>rdma_cm_id *cmid, >>> >>> conn->ibc_sched = sched; >>> >>> - rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, >>>init_qp_attr); >>> - if (rc != 0) { >>> - CERROR("Can't create QP: %d, send_wr: %d, recv_wr: >>>%d\n", >>> - rc, init_qp_attr->cap.max_send_wr, >>> - init_qp_attr->cap.max_recv_wr); >>> - goto failed_2; >>> - } >>> + orig_wr = init_qp_attr->cap.max_send_wr; >>> + do { >>> + rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, >>>init_qp_attr); >>> + if (!rc || init_qp_attr->cap.max_send_wr < 16) >>> + break; >>> + >>> + init_qp_attr->cap.max_send_wr /= 2; >>> + } while (rc); >>> + if (rc != 0) { >>> + CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d\n", >>> + rc, init_qp_attr->cap.max_send_wr, >>> + init_qp_attr->cap.max_recv_wr); >>> + goto failed_2; >>> + } >>> + if (orig_wr != init_qp_attr->cap.max_send_wr) >>> + pr_info("original send wr %d, created with %d\n", >>> + orig_wr, init_qp_attr->cap.max_send_wr); >>> >>> LIBCFS_FREE(init_qp_attr, sizeof(*init_qp_attr)); >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
