Hi Artem, you can check the maximum limits using the patch I had sent earlier in the same thread. Also, the patch http://patches.gluster.com/patch/5844/ (which is not accepted yet), will check for whether the number of cqe being passed in ibv_creation_cq is greater than the value allowed by the device and if so, it will try to create CQ with maximum limit allowed by the device.
regards, ----- Original Message ----- From: "Artem Trunov" <[email protected]> To: "Raghavendra G" <[email protected]> Cc: "Jeremy Stout" <[email protected]>, [email protected] Sent: Thursday, December 9, 2010 7:13:40 PM Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 Hi, Ravendra, Jeremy This was very interesting debugging thread to me, since I have the same symptoms, but unsure of the origin. Please see log for my mount command at the end of the message. I have installed 3.3.1. My OFED is 1.5.1 - does it make serious difference between already mentioned 1.5.2? On hardware limitations - I have Mellanox InfiniHost III Lx 20Gb/s and it says in specs: "Supports 16 million QPs, EEs & CQs " Is this enough? How can I query for actual settings on max_cq, max_cqe? In general, how should I proceed? What are my other debugging options? Should I try to go Jeremy path with hacking the gluster code? cheers Artem. Log: --------- [2010-12-09 15:15:53.847595] W [io-stats.c:1644:init] test-volume: dangling volume. check volfile [2010-12-09 15:15:53.847643] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.847657] W [dict.c:1204:data_to_str] dict: @data=(nil) [2010-12-09 15:15:53.858574] E [rdma.c:2066:rdma_create_cq] rpc-transport/rdma: test-volume-client-1: creation of send_cq failed [2010-12-09 15:15:53.858805] E [rdma.c:3771:rdma_get_device] rpc-transport/rdma: test-volume-client-1: could not create CQ [2010-12-09 15:15:53.858821] E [rdma.c:3957:rdma_init] rpc-transport/rdma: could not create rdma device for mthca0 [2010-12-09 15:15:53.858893] E [rdma.c:4789:init] test-volume-client-1: Failed to initialize IB Device [2010-12-09 15:15:53.858909] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2010-12-09 15:15:53 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6[0x32aca302d0] /lib64/libc.so.6(strcmp+0x0)[0x32aca79140] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so[0x2aaaac4fef6c] /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so(init+0x2f)[0x2aaaac50013f] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x389)[0x3fcca0cac9] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0xfe)[0x3fcca1053e] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(client_init_rpc+0xa1)[0x2aaaab194f01] /usr/lib64/glusterfs/3.1.1/xlator/protocol/client.so(init+0x129)[0x2aaaab1950d9] /usr/lib64/libglusterfs.so.0(xlator_init+0x58)[0x3fcc617398] /usr/lib64/libglusterfs.so.0(glusterfs_graph_init+0x31)[0x3fcc640291] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x38)[0x3fcc6403c8] /usr/sbin/glusterfs(glusterfs_process_volfp+0xfa)[0x40373a] /usr/sbin/glusterfs(mgmt_getspec_cbk+0xc5)[0x406125] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x3fcca0f542] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x3fcca0f73d] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x3fcca0a95c] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2aaaaad6ef9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x170)[0x2aaaaad6f130] /usr/lib64/libglusterfs.so.0[0x3fcc637917] /usr/sbin/glusterfs(main+0x39b)[0x40470b] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32aca1d994] /usr/sbin/glusterfs[0x402e29] On Fri, Dec 3, 2010 at 1:53 PM, Raghavendra G <[email protected]> wrote: > From the logs its evident that the reason for completion queue creation > failure is that the number of completion queue elements (in a completion > queue) we had requested in ibv_create_cq, (1024 * send_count) is less than > the maximum supported by the ib hardware (max_cqe = 131071). > > ----- Original Message ----- > From: "Jeremy Stout" <[email protected]> > To: "Raghavendra G" <[email protected]> > Cc: [email protected] > Sent: Friday, December 3, 2010 4:20:04 PM > Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 > > I patched the source code and rebuilt GlusterFS. Here are the full logs: > Server: > [2010-12-03 07:08:55.945804] I [glusterd.c:275:init] management: Using > /etc/glusterd as working directory > [2010-12-03 07:08:55.947692] E [rdma.c:2047:rdma_create_cq] > rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = > 65408, max_cqe = 131071, max_mr = 131056 > [2010-12-03 07:08:55.953226] E [rdma.c:2079:rdma_create_cq] > rpc-transport/rdma: rdma.management: creation of send_cq failed > [2010-12-03 07:08:55.953509] E [rdma.c:3785:rdma_get_device] > rpc-transport/rdma: rdma.management: could not create CQ > [2010-12-03 07:08:55.953582] E [rdma.c:3971:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-03 07:08:55.953668] E [rdma.c:4803:init] rdma.management: > Failed to initialize IB Device > [2010-12-03 07:08:55.953691] E > [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' > initialization failed > [2010-12-03 07:08:55.953780] I [glusterd.c:96:glusterd_uuid_init] > glusterd: generated UUID: 4eb47ca7-227c-49c4-97bd-25ac177b2f6a > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option working-directory /etc/glusterd > 4: option transport-type socket,rdma > 5: option transport.socket.keepalive-time 10 > 6: option transport.socket.keepalive-interval 2 > 7: end-volume > 8: > > +------------------------------------------------------------------------------+ > [2010-12-03 07:09:10.244790] I > [glusterd-handler.c:785:glusterd_handle_create_volume] glusterd: > Received create volume req > [2010-12-03 07:09:10.247646] I [glusterd-utils.c:232:glusterd_lock] > glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a > [2010-12-03 07:09:10.247678] I > [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired > local lock > [2010-12-03 07:09:10.247708] I > [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock > req to 0 peers > [2010-12-03 07:09:10.248038] I > [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req > to 0 peers > [2010-12-03 07:09:10.251970] I > [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req > to 0 peers > [2010-12-03 07:09:10.252020] I > [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent > unlock req to 0 peers > [2010-12-03 07:09:10.252036] I > [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared > local lock > [2010-12-03 07:09:22.11649] I > [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: > Received start vol reqfor volume testdir > [2010-12-03 07:09:22.11724] I [glusterd-utils.c:232:glusterd_lock] > glusterd: Cluster lock held by 4eb47ca7-227c-49c4-97bd-25ac177b2f6a > [2010-12-03 07:09:22.11734] I > [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired > local lock > [2010-12-03 07:09:22.11761] I > [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock > req to 0 peers > [2010-12-03 07:09:22.12120] I > [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req > to 0 peers > [2010-12-03 07:09:22.184403] I > [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to > start glusterfs for brick pgh-submit-1:/mnt/gluster > [2010-12-03 07:09:22.229143] I > [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req > to 0 peers > [2010-12-03 07:09:22.229198] I > [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent > unlock req to 0 peers > [2010-12-03 07:09:22.229218] I > [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared > local lock > [2010-12-03 07:09:22.240157] I > [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) > on port 24009 > > > Client: > [2010-12-03 07:09:00.82784] W [io-stats.c:1644:init] testdir: dangling > volume. check volfile > [2010-12-03 07:09:00.82824] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-03 07:09:00.82836] W [dict.c:1204:data_to_str] dict: @data=(nil) > [2010-12-03 07:09:00.85980] E [rdma.c:2047:rdma_create_cq] > rpc-transport/rdma: max_mr_size = 18446744073709551615, max_cq = > 65408, max_cqe = 131071, max_mr = 131056 > [2010-12-03 07:09:00.92883] E [rdma.c:2079:rdma_create_cq] > rpc-transport/rdma: testdir-client-0: creation of send_cq failed > [2010-12-03 07:09:00.93156] E [rdma.c:3785:rdma_get_device] > rpc-transport/rdma: testdir-client-0: could not create CQ > [2010-12-03 07:09:00.93224] E [rdma.c:3971:rdma_init] > rpc-transport/rdma: could not create rdma device for mthca0 > [2010-12-03 07:09:00.93313] E [rdma.c:4803:init] testdir-client-0: > Failed to initialize IB Device > [2010-12-03 07:09:00.93332] E [rpc-transport.c:971:rpc_transport_load] > rpc-transport: 'rdma' initialization failed > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume testdir-client-0 > 2: type protocol/client > 3: option remote-host submit-1 > 4: option remote-subvolume /mnt/gluster > 5: option transport-type rdma > 6: end-volume > 7: > 8: volume testdir-write-behind > 9: type performance/write-behind > 10: subvolumes testdir-client-0 > 11: end-volume > 12: > 13: volume testdir-read-ahead > 14: type performance/read-ahead > 15: subvolumes testdir-write-behind > 16: end-volume > 17: > 18: volume testdir-io-cache > 19: type performance/io-cache > 20: subvolumes testdir-read-ahead > 21: end-volume > 22: > 23: volume testdir-quick-read > 24: type performance/quick-read > 25: subvolumes testdir-io-cache > 26: end-volume > 27: > 28: volume testdir-stat-prefetch > 29: type performance/stat-prefetch > 30: subvolumes testdir-quick-read > 31: end-volume > 32: > 33: volume testdir > 34: type debug/io-stats > 35: subvolumes testdir-stat-prefetch > 36: end-volume > > +------------------------------------------------------------------------------+ > > > On Fri, Dec 3, 2010 at 12:38 AM, Raghavendra G <[email protected]> > wrote: >> Hi Jeremy, >> >> Can you apply the attached patch, rebuild and start glusterfs? Please make >> sure to send us the logs of glusterfs. >> >> regards, >> ----- Original Message ----- >> From: "Jeremy Stout" <[email protected]> >> To: [email protected] >> Sent: Friday, December 3, 2010 6:38:00 AM >> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >> >> I'm currently using OFED 1.5.2. >> >> For the sake of testing, I just compiled GlusterFS 3.1.1 from source, >> without any modifications, on two systems that have a 2.6.33.7 kernel >> and OFED 1.5.2 built from source. Here are the results: >> >> Server: >> [2010-12-02 21:17:55.886563] I >> [glusterd-handler.c:936:glusterd_handle_cli_start_volume] glusterd: >> Received start vol reqfor volume testdir >> [2010-12-02 21:17:55.886597] I [glusterd-utils.c:232:glusterd_lock] >> glusterd: Cluster lock held by 7dd23af5-277e-4ea1-a495-2a9d882287ec >> [2010-12-02 21:17:55.886607] I >> [glusterd-handler.c:2835:glusterd_op_txn_begin] glusterd: Acquired >> local lock >> [2010-12-02 21:17:55.886628] I >> [glusterd3_1-mops.c:1091:glusterd3_1_cluster_lock] glusterd: Sent lock >> req to 0 peers >> [2010-12-02 21:17:55.887031] I >> [glusterd3_1-mops.c:1233:glusterd3_1_stage_op] glusterd: Sent op req >> to 0 peers >> [2010-12-02 21:17:56.60427] I >> [glusterd-utils.c:971:glusterd_volume_start_glusterfs] : About to >> start glusterfs for brick submit-1:/mnt/gluster >> [2010-12-02 21:17:56.104896] I >> [glusterd3_1-mops.c:1323:glusterd3_1_commit_op] glusterd: Sent op req >> to 0 peers >> [2010-12-02 21:17:56.104935] I >> [glusterd3_1-mops.c:1145:glusterd3_1_cluster_unlock] glusterd: Sent >> unlock req to 0 peers >> [2010-12-02 21:17:56.104953] I >> [glusterd-op-sm.c:4738:glusterd_op_txn_complete] glusterd: Cleared >> local lock >> [2010-12-02 21:17:56.114764] I >> [glusterd-pmap.c:281:pmap_registry_remove] pmap: removing brick (null) >> on port 24009 >> >> Client: >> [2010-12-02 21:17:25.503395] W [io-stats.c:1644:init] testdir: >> dangling volume. check volfile >> [2010-12-02 21:17:25.503434] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-02 21:17:25.503447] W [dict.c:1204:data_to_str] dict: @data=(nil) >> [2010-12-02 21:17:25.543409] E [rdma.c:2066:rdma_create_cq] >> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >> [2010-12-02 21:17:25.543660] E [rdma.c:3771:rdma_get_device] >> rpc-transport/rdma: testdir-client-0: could not create CQ >> [2010-12-02 21:17:25.543725] E [rdma.c:3957:rdma_init] >> rpc-transport/rdma: could not create rdma device for mthca0 >> [2010-12-02 21:17:25.543812] E [rdma.c:4789:init] testdir-client-0: >> Failed to initialize IB Device >> [2010-12-02 21:17:25.543830] E >> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' >> initialization failed >> >> Thank you for the help so far. >> >> On Thu, Dec 2, 2010 at 8:02 PM, Craig Carl <[email protected]> wrote: >>> Jeremy - >>> What version of OFED are you running? Would you mind install version 1.5.2 >>> from source? We have seen this resolve several issues of this type. >>> http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/ >>> >>> >>> Thanks, >>> >>> Craig >>> >>> --> >>> Craig Carl >>> Senior Systems Engineer >>> Gluster >>> >>> >>> On 12/02/2010 10:05 AM, Jeremy Stout wrote: >>>> >>>> An another follow-up, I tested several compilations today with >>>> different values for send/receive count. I found the maximum value I >>>> could use for both variables was 127. With a value of 127, GlusterFS >>>> did not produce any errors. However, when I changed the value back to >>>> 128, the RDMA errors appeared again. >>>> >>>> I also tried setting soft/hard "memlock" to unlimited in the >>>> limits.conf file, but still ran into RDMA errors on the client side >>>> when the count variables were set to 128. >>>> >>>> On Thu, Dec 2, 2010 at 9:04 AM, Jeremy Stout<[email protected]> >>>> wrote: >>>>> >>>>> Thank you for the response. I've been testing GlusterFS 3.1.1 on two >>>>> different OpenSUSE 11.3 systems. Since both systems generated the same >>>>> error messages, I'll include the output for both. >>>>> >>>>> System #1: >>>>> fs-1:~ # cat /proc/meminfo >>>>> MemTotal: 16468756 kB >>>>> MemFree: 16126680 kB >>>>> Buffers: 15680 kB >>>>> Cached: 155860 kB >>>>> SwapCached: 0 kB >>>>> Active: 65228 kB >>>>> Inactive: 123100 kB >>>>> Active(anon): 18632 kB >>>>> Inactive(anon): 48 kB >>>>> Active(file): 46596 kB >>>>> Inactive(file): 123052 kB >>>>> Unevictable: 1988 kB >>>>> Mlocked: 1988 kB >>>>> SwapTotal: 0 kB >>>>> SwapFree: 0 kB >>>>> Dirty: 30072 kB >>>>> Writeback: 4 kB >>>>> AnonPages: 18780 kB >>>>> Mapped: 12136 kB >>>>> Shmem: 220 kB >>>>> Slab: 39592 kB >>>>> SReclaimable: 13108 kB >>>>> SUnreclaim: 26484 kB >>>>> KernelStack: 2360 kB >>>>> PageTables: 2036 kB >>>>> NFS_Unstable: 0 kB >>>>> Bounce: 0 kB >>>>> WritebackTmp: 0 kB >>>>> CommitLimit: 8234376 kB >>>>> Committed_AS: 107304 kB >>>>> VmallocTotal: 34359738367 kB >>>>> VmallocUsed: 314316 kB >>>>> VmallocChunk: 34349860776 kB >>>>> HardwareCorrupted: 0 kB >>>>> HugePages_Total: 0 >>>>> HugePages_Free: 0 >>>>> HugePages_Rsvd: 0 >>>>> HugePages_Surp: 0 >>>>> Hugepagesize: 2048 kB >>>>> DirectMap4k: 9856 kB >>>>> DirectMap2M: 3135488 kB >>>>> DirectMap1G: 13631488 kB >>>>> >>>>> fs-1:~ # uname -a >>>>> Linux fs-1 2.6.32.25-November2010 #2 SMP PREEMPT Mon Nov 1 15:19:55 >>>>> EDT 2010 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> fs-1:~ # ulimit -l >>>>> 64 >>>>> >>>>> System #2: >>>>> submit-1:~ # cat /proc/meminfo >>>>> MemTotal: 16470424 kB >>>>> MemFree: 16197292 kB >>>>> Buffers: 11788 kB >>>>> Cached: 85492 kB >>>>> SwapCached: 0 kB >>>>> Active: 39120 kB >>>>> Inactive: 76548 kB >>>>> Active(anon): 18532 kB >>>>> Inactive(anon): 48 kB >>>>> Active(file): 20588 kB >>>>> Inactive(file): 76500 kB >>>>> Unevictable: 0 kB >>>>> Mlocked: 0 kB >>>>> SwapTotal: 67100656 kB >>>>> SwapFree: 67100656 kB >>>>> Dirty: 24 kB >>>>> Writeback: 0 kB >>>>> AnonPages: 18408 kB >>>>> Mapped: 11644 kB >>>>> Shmem: 184 kB >>>>> Slab: 34000 kB >>>>> SReclaimable: 8512 kB >>>>> SUnreclaim: 25488 kB >>>>> KernelStack: 2160 kB >>>>> PageTables: 1952 kB >>>>> NFS_Unstable: 0 kB >>>>> Bounce: 0 kB >>>>> WritebackTmp: 0 kB >>>>> CommitLimit: 75335868 kB >>>>> Committed_AS: 105620 kB >>>>> VmallocTotal: 34359738367 kB >>>>> VmallocUsed: 76416 kB >>>>> VmallocChunk: 34359652640 kB >>>>> HardwareCorrupted: 0 kB >>>>> HugePages_Total: 0 >>>>> HugePages_Free: 0 >>>>> HugePages_Rsvd: 0 >>>>> HugePages_Surp: 0 >>>>> Hugepagesize: 2048 kB >>>>> DirectMap4k: 7488 kB >>>>> DirectMap2M: 16769024 kB >>>>> >>>>> submit-1:~ # uname -a >>>>> Linux submit-1 2.6.33.7-November2010 #1 SMP PREEMPT Mon Nov 8 13:49:00 >>>>> EST 2010 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> submit-1:~ # ulimit -l >>>>> 64 >>>>> >>>>> I retrieved the memory information on each machine after starting the >>>>> glusterd process. >>>>> >>>>> On Thu, Dec 2, 2010 at 3:51 AM, Raghavendra G<[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi Jeremy, >>>>>> >>>>>> can you also get the output of, >>>>>> >>>>>> #uname -a >>>>>> >>>>>> #ulimit -l >>>>>> >>>>>> regards, >>>>>> ----- Original Message ----- >>>>>> From: "Raghavendra G"<[email protected]> >>>>>> To: "Jeremy Stout"<[email protected]> >>>>>> Cc: [email protected] >>>>>> Sent: Thursday, December 2, 2010 10:20:04 AM >>>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>>> >>>>>> Hi Jeremy, >>>>>> >>>>>> In order to diagnoise why completion queue creation is failing (as >>>>>> indicated by logs), we want to know what was the free memory available in >>>>>> your system when glusterfs was started. >>>>>> >>>>>> regards, >>>>>> ----- Original Message ----- >>>>>> From: "Raghavendra G"<[email protected]> >>>>>> To: "Jeremy Stout"<[email protected]> >>>>>> Cc: [email protected] >>>>>> Sent: Thursday, December 2, 2010 10:11:18 AM >>>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>>> >>>>>> Hi Jeremy, >>>>>> >>>>>> Yes, there might be some performance decrease. But, it should not affect >>>>>> working of rdma. >>>>>> >>>>>> regards, >>>>>> ----- Original Message ----- >>>>>> From: "Jeremy Stout"<[email protected]> >>>>>> To: [email protected] >>>>>> Sent: Thursday, December 2, 2010 8:30:20 AM >>>>>> Subject: Re: [Gluster-users] RDMA Problems with GlusterFS 3.1.1 >>>>>> >>>>>> As an update to my situation, I think I have GlusterFS 3.1.1 working >>>>>> now. I was able to create and mount RDMA volumes without any errors. >>>>>> >>>>>> To fix the problem, I had to make the following changes on lines 3562 >>>>>> and 3563 in rdma.c: >>>>>> options->send_count = 32; >>>>>> options->recv_count = 32; >>>>>> >>>>>> The values were set to 128. >>>>>> >>>>>> I'll run some tests tomorrow to verify that it is working correctly. >>>>>> Assuming it does, what would be the expected side-effect of changing >>>>>> the values from 128 to 32? Will there be a decrease in performance? >>>>>> >>>>>> >>>>>> On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout<[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Here are the results of the test: >>>>>>> submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # >>>>>>> ibv_srq_pingpong >>>>>>> local address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: >>>>>>> local address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: >>>>>>> local address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: >>>>>>> remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: >>>>>>> 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec >>>>>>> 1000 iters in 0.01 seconds = 11.07 usec/iter >>>>>>> >>>>>>> fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong >>>>>>> submit-1 >>>>>>> local address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID :: >>>>>>> local address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID :: >>>>>>> local address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID :: >>>>>>> remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID :: >>>>>>> 8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec >>>>>>> 1000 iters in 0.01 seconds = 8.83 usec/iter >>>>>>> >>>>>>> Based on the output, I believe it ran correctly. >>>>>>> >>>>>>> On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati<[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Can you verify that ibv_srq_pingpong works from the server where this >>>>>>>> log >>>>>>>> file is from? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Avati >>>>>>>> >>>>>>>> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout<[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses >>>>>>>>> RDMA, I'm seeing the following error messages in the log file on the >>>>>>>>> server: >>>>>>>>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service >>>>>>>>> started >>>>>>>>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: >>>>>>>>> @data=(nil) >>>>>>>>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: >>>>>>>>> @data=(nil) >>>>>>>>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq] >>>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>>>>>>>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device] >>>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ >>>>>>>>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init] >>>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0 >>>>>>>>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0: >>>>>>>>> Failed to initialize IB Device >>>>>>>>> [2010-11-30 18:37:53.60030] E >>>>>>>>> [rpc-transport.c:971:rpc_transport_load] >>>>>>>>> rpc-transport: 'rdma' initialization failed >>>>>>>>> >>>>>>>>> On the client, I see: >>>>>>>>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir: >>>>>>>>> dangling volume. check volfile >>>>>>>>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: >>>>>>>>> @data=(nil) >>>>>>>>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: >>>>>>>>> @data=(nil) >>>>>>>>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq] >>>>>>>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed >>>>>>>>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device] >>>>>>>>> rpc-transport/rdma: testdir-client-0: could not create CQ >>>>>>>>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init] >>>>>>>>> rpc-transport/rdma: could not create rdma device for mthca0 >>>>>>>>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0: >>>>>>>>> Failed to initialize IB Device >>>>>>>>> [2010-11-30 18:43:49.736841] E >>>>>>>>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' >>>>>>>>> initialization failed >>>>>>>>> >>>>>>>>> This results in an unsuccessful mount. >>>>>>>>> >>>>>>>>> I created the mount using the following commands: >>>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir >>>>>>>>> transport rdma submit-1:/exports >>>>>>>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir >>>>>>>>> >>>>>>>>> To mount the directory, I use: >>>>>>>>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs >>>>>>>>> >>>>>>>>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 and >>>>>>>>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the >>>>>>>>> commands listed above produced no error messages. >>>>>>>>> >>>>>>>>> If anyone can provide help with debugging these error messages, it >>>>>>>>> would be appreciated. >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> [email protected] >>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> [email protected] >>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> [email protected] >>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> [email protected] >>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] >>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
