On Wed, 2007-10-03 at 13:42 -0400, Pete Wyckoff wrote: > How does the requester (in IB speak) know that an RDMA Write > operation has completed on the responder? > > We have a software iSER target, available at git.osc.edu/tgt or > browse at http://git.osc.edu/?p=tgt.git . Using the existing > in-kernel iSER initiator code, very rarely data corruption occurs, > in that the received data from SCSI read operations does not match > what was expected. Sometimes it appears as if random kernel memory > has been scribbled on by an errant RDMA write from the target. My > current working theory that the RDMA write has not completed by the > time the initiator looks at its incoming data buffer. > > Single RC QP, single CQ, no SRQ. Only Send, Receive, and RDMA Write > work requests are used. After everything is connected up, a SCSI > read sequence looks like: > > initiator: register pages with FMR, write test pattern > initiator: Send request to target > target: Recv request > target: RDMA Write response to initiator > target: Wait for CQ entry for local RDMA Write completion Pete:
I don't think this should be necessary... > target: Send response to initiator ...as long as the send is posted on the same SQ as the write. > initiator: Recv response, access buffer > > On very rare occasions, this buffer will have the test pattern, not > the data that the target just sent. > > Machines are opteron, fedora 7 up-to-date with its openfab libs, > kernel 2.6.23-rc6 on target. Either 2.6.23-rc6 or 2.6.22 or > 2.6.18-rhel5 on initiator. For some reason, it is much easier to > produce with the rhel5 kernel. One site with fast disks can see > similar corruption with 2.6.23-rc6, however. Target is pure > userspace. Initiator is in kernel and is poked by "lmdd" (like > normal dd) through an iSCSI block device (/dev/sdb). > > The IB spec seems to indicate that the contents of the RDMA Write > buffer should be stable after completion of a subsequent send > message (o9-20). In fact, the "Wait for CQ entry" step on the > target should be unnecessary, no? I think so too. > > Could there be some caching issues that the initiator is missing? > I've added print[fk]s to the initiator and target to verify that the > sequence of events is truly as above, and that the virtual addresses > are as expected on both sides. > > Any suggestions or advice would help. Thanks, > If your theory is correct, the data should eventually show up. Does it? Does your code check for errors on dma_map_single/page? > -- Pete > > > P.S. Here are some debugging printfs not in the git. > > Userspace code does 200 read()s of length 8000, but complains about > the result somewhere in the 14th read, from 112000 to 120000, and > exits early. Expected pattern is a series of 400000 4-byte words, > incrementing by 4, starting from 0. So 0x00000000, 0x00000004, ..., > 0x001869fc: > > % lmdd of=internal ipat=1 if=/dev/sdb bs=8000 count=200 mismatch=10 > off=112000 want=1c000 got=3b3b3b3b > > Initiator generates a series of SCSI operations, as driven by > readahead and the block queue scheduler. You can see that it starts > reading 4 pages, then 1 page, then 23 pages, then 1 page and so on, > in order. These sizes and offsets vary from run to run. Each line > here is printed after the SCSI read response has been received. It > prints the first word in the buffer, and you can see the test > pattern where data should be: > > tag 02 va 36061000 len 4000 word0 00000000 ref 1 > tag 03 va 36065000 len 1000 word0 00004000 ref 1 > tag 04 va 36066000 len 17000 word0 00005000 ref 1 > tag 05 va 7b6bc000 len 1000 word0 3b3b3b3b ref 1 Is it interesting that the bad word occurs on the first page of the new map? > tag 06 va 7b6bd000 len 1f000 word0 0001d000 ref 1 > tag 07 va 7bdc2000 len 20000 word0 0003c000 ref 1 > > The userspace target code prints a line when it starts the RDMA > write, then a line when the RDMA write completes locally, then a > line when it sends the repsponse. The tags are what the initiator > assigned to each request. The target thinks it is sending a > 4096-byte buffer that has 0x1c000 as its first word, but the > initiator did not see it: > > tag 02 va 36061000 len 4000 word0 00000000 rdmaw > tag 02 rdmaw completion > tag 02 resp > tag 03 va 36065000 len 1000 word0 00004000 rdmaw > tag 03 rdmaw completion > tag 03 resp > tag 04 va 36066000 len 17000 word0 00005000 rdmaw > tag 04 rdmaw completion > tag 04 resp > tag 05 va 7b6bc000 len 1000 word0 0001c000 rdmaw > tag 05 rdmaw completion > tag 05 resp > tag 06 va 7b6bd000 len 1f000 word0 0001d000 rdmaw > tag 06 rdmaw completion > tag 07 va 7bdc2000 len 20000 word0 0003c000 rdmaw > tag 07 rdmaw completion > tag 06 resp > tag 07 resp > > _______________________________________________ > general mailing list > general@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general