Just one more information I need from you. Assuming you have the coredump, could you attach it to gdb and print local->fop and tell me what fop it was when the crash happened? You'll need to switch to frame 3 in gdb to get the value of this variable.
-Krutika On Wed, Dec 21, 2016 at 5:35 PM, Krutika Dhananjay <[email protected]> wrote: > Thanks for this. The information seems sufficient at the moment. > Will get back to you on this if/when I find something. > > -Krutika > > On Mon, Dec 19, 2016 at 1:44 PM, qingwei wei <[email protected]> wrote: > >> Hi Krutika, >> >> Sorry for the delay as i am busy with other works. Attached is the >> tar.gz file with client and server log, the gfid information on the >> shard folder (please look at test.0.0 file as the log is captured when >> i run fio on this file.) and also the print statement i put inside the >> code. Fyi, i did 2 runs this time and only the second run give me >> problem. Hope this information helps. >> >> Regards, >> >> Cw >> >> On Thu, Dec 15, 2016 at 8:02 PM, Krutika Dhananjay <[email protected]> >> wrote: >> > Good that you asked. I'll try but be warned this will involve me coming >> back >> > to you with lot more questions. :) >> > >> > I've been trying this for the past two days (not to mention the fio run >> > takes >> > really long) and so far there has been no crash/assert failure. >> > >> > If you already have the core: >> > in frame 1, >> > 0. print block_num >> > 1. get lru_inode_ctx->stat.ia_gfid >> > 2. convert it to hex >> > 3. find the gfid in your backend that corresponds to this gfid and >> share its >> > path in your response >> > 4. print priv->inode_count >> > 5. and of course lru_inode_ctx->block_num :) >> > 6. Also attach the complete brick and client logs. >> > >> > -Krutika >> > >> > >> > On Thu, Dec 15, 2016 at 3:18 PM, qingwei wei <[email protected]> >> wrote: >> >> >> >> Hi Krutika, >> >> >> >> Do you need anymore information? Do let me know as i can try on my >> >> test system. Thanks. >> >> >> >> Cw >> >> >> >> On Tue, Dec 13, 2016 at 12:17 AM, qingwei wei <[email protected]> >> wrote: >> >> > Hi Krutika, >> >> > >> >> > You mean FIO command? >> >> > >> >> > Below is how i do the sequential write. This example i am using 400GB >> >> > file, for the SHARD_MAX_INODE=16, i use 300MB file. >> >> > >> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 >> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs >> >> > 256k -numjobs 1 -iodepth 2 -name test -rw write >> >> > >> >> > And after FIO complete the above workload, i do the random write >> >> > >> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 >> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs >> >> > 8k -numjobs 1 -iodepth 2 -name test -rw randwrite >> >> > >> >> > The error (Sometimes segmentation fault) only happen during random >> >> > write. >> >> > >> >> > The gluster volume is 3 replica volume with shard enable and 16MB >> >> > shard block size. >> >> > >> >> > Thanks. >> >> > >> >> > Cw >> >> > >> >> > On Tue, Dec 13, 2016 at 12:00 AM, Krutika Dhananjay >> >> > <[email protected]> wrote: >> >> >> I tried but couldn't recreate this issue (even with SHARD_MAX_INODES >> >> >> being >> >> >> 16). >> >> >> Could you share the exact command you used? >> >> >> >> >> >> -Krutika >> >> >> >> >> >> On Mon, Dec 12, 2016 at 12:15 PM, qingwei wei <[email protected]> >> >> >> wrote: >> >> >>> >> >> >>> Hi Krutika, >> >> >>> >> >> >>> Thanks. Looking forward to your reply. >> >> >>> >> >> >>> Cw >> >> >>> >> >> >>> On Mon, Dec 12, 2016 at 2:27 PM, Krutika Dhananjay >> >> >>> <[email protected]> >> >> >>> wrote: >> >> >>> > Hi, >> >> >>> > >> >> >>> > First of all, apologies for the late reply. Couldn't find time to >> >> >>> > look >> >> >>> > into >> >> >>> > this >> >> >>> > until now. >> >> >>> > >> >> >>> > Changing SHARD_MAX_INODES value from 12384 to 16 is a cool trick! >> >> >>> > Let me try that as well and get back to you in some time. >> >> >>> > >> >> >>> > -Krutika >> >> >>> > >> >> >>> > On Thu, Dec 8, 2016 at 11:07 AM, qingwei wei < >> [email protected]> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hi, >> >> >>> >> >> >> >>> >> With the help from my colleague, we did some changes to the code >> >> >>> >> with >> >> >>> >> reduce number of SHARD_MAX_INODES (from 16384 to 16) and also >> >> >>> >> include >> >> >>> >> the printing of blk_num inside __shard_update_shards_inode_list. >> We >> >> >>> >> then execute fio to first do sequential write of 300MB file. >> After >> >> >>> >> this run completed, we then use fio to generate random write >> (8k). >> >> >>> >> And >> >> >>> >> during this random write run, we found that there is situation >> >> >>> >> where >> >> >>> >> the blk_num is negative number and this trigger the following >> >> >>> >> assertion. >> >> >>> >> >> >> >>> >> GF_ASSERT (lru_inode_ctx->block_num > 0); >> >> >>> >> >> >> >>> >> [2016-12-08 03:16:34.217582] E >> >> >>> >> [shard.c:468:__shard_update_shards_inode_list] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> (-->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so >> (shard_common_lookup_shards_cbk+0x2d) >> >> >>> >> [0x7f7300930b6d] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( >> shard_link_block_inode+0xce) >> >> >>> >> [0x7f7300930b1e] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( >> __shard_update_shards_inode_list+0x36b) >> >> >>> >> [0x7f730092bf5b] ) 0-: Assertion failed: >> lru_inode_ctx->block_num > >> >> >>> >> 0 >> >> >>> >> >> >> >>> >> Also, there is segmentation fault shortly after this assertion >> and >> >> >>> >> after that fio exit with error. >> >> >>> >> >> >> >>> >> frame : type(0) op(0) >> >> >>> >> patchset: git://git.gluster.com/glusterfs.git >> >> >>> >> signal received: 11 >> >> >>> >> time of crash: >> >> >>> >> 2016-12-08 03:16:34 >> >> >>> >> configuration details: >> >> >>> >> argp 1 >> >> >>> >> backtrace 1 >> >> >>> >> dlfcn 1 >> >> >>> >> libpthread 1 >> >> >>> >> llistxattr 1 >> >> >>> >> setfsid 1 >> >> >>> >> spinlock 1 >> >> >>> >> epoll.h 1 >> >> >>> >> xattr.h 1 >> >> >>> >> st_atim.tv_nsec 1 >> >> >>> >> package-string: glusterfs 3.7.17 >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9 >> 2)[0x7f730e900332] >> >> >>> >> >> >> >>> >> /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x2d5)[0x7f7 >> 30e9250b5] >> >> >>> >> /lib64/libc.so.6(+0x35670)[0x7f730d1f1670] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(__s >> hard_update_shards_inode_list+0x1d4)[0x7f730092bdc4] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_link_block_inode+0xce)[0x7f7300930b1e] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_common_lookup_shards_cbk+0x2d)[0x7f7300930b6d] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/cluster/distribute.so >> (dht_lookup_cbk+0x380)[0x7f7300b8e240] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/protocol/client.so(cl >> ient3_3_lookup_cbk+0x769)[0x7f7300df4989] >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7 >> f730e6ce010] >> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1df)[0x7f730e >> 6ce2ef] >> >> >>> >> >> >> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f >> 730e6ca483] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x6 >> 344)[0x7f73034dc344] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x8 >> f44)[0x7f73034def44] >> >> >>> >> /usr/local/lib/libglusterfs.so.0(+0x925aa)[0x7f730e96c5aa] >> >> >>> >> /lib64/libpthread.so.0(+0x7dc5)[0x7f730d96ddc5] >> >> >>> >> >> >> >>> >> Core dump: >> >> >>> >> >> >> >>> >> Using host libthread_db library "/lib64/libthread_db.so.1". >> >> >>> >> Core was generated by `/usr/local/sbin/glusterfs >> >> >>> >> --volfile-server=10.217.242.32 --volfile-id=/testSF1'. >> >> >>> >> Program terminated with signal 11, Segmentation fault. >> >> >>> >> #0 list_del_init (old=0x7f72f4003de0) at >> >> >>> >> ../../../../libglusterfs/src/list.h:87 >> >> >>> >> 87 old->prev->next = old->next; >> >> >>> >> >> >> >>> >> bt >> >> >>> >> >> >> >>> >> #0 list_del_init (old=0x7f72f4003de0) at >> >> >>> >> ../../../../libglusterfs/src/list.h:87 >> >> >>> >> #1 __shard_update_shards_inode_list >> >> >>> >> (linked_inode=linked_inode@entry=0x7f72fa7a6e48, >> >> >>> >> this=this@entry=0x7f72fc0090c0, base_inode=0x7f72fa7a5108, >> >> >>> >> block_num=block_num@entry=10) at shard.c:469 >> >> >>> >> #2 0x00007f7300930b1e in shard_link_block_inode >> >> >>> >> (local=local@entry=0x7f730ec4ed00, block_num=10, >> inode=<optimized >> >> >>> >> out>, >> >> >>> >> buf=buf@entry=0x7f730180c990) at shard.c:1559 >> >> >>> >> #3 0x00007f7300930b6d in shard_common_lookup_shards_cbk >> >> >>> >> (frame=0x7f730c611204, cookie=<optimized out>, >> this=0x7f72fc0090c0, >> >> >>> >> op_ret=0, >> >> >>> >> op_errno=<optimized out>, inode=<optimized out>, >> >> >>> >> buf=0x7f730180c990, xdata=0x7f730c029cdc, >> >> >>> >> postparent=0x7f730180ca00) >> >> >>> >> at shard.c:1596 >> >> >>> >> #4 0x00007f7300b8e240 in dht_lookup_cbk (frame=0x7f730c61dc40, >> >> >>> >> cookie=<optimized out>, this=<optimized out>, op_ret=0, >> >> >>> >> op_errno=22, >> >> >>> >> inode=0x7f72fa7a6e48, stbuf=0x7f730180c990, >> >> >>> >> xattr=0x7f730c029cdc, >> >> >>> >> postparent=0x7f730180ca00) at dht-common.c:2362 >> >> >>> >> #5 0x00007f7300df4989 in client3_3_lookup_cbk (req=<optimized >> >> >>> >> out>, >> >> >>> >> iov=<optimized out>, count=<optimized out>, >> myframe=0x7f730c616ab4) >> >> >>> >> at client-rpc-fops.c:2988 >> >> >>> >> #6 0x00007f730e6ce010 in rpc_clnt_handle_reply >> >> >>> >> (clnt=clnt@entry=0x7f72fc04c040, >> >> >>> >> pollin=pollin@entry=0x7f72fc079560) >> >> >>> >> at rpc-clnt.c:796 >> >> >>> >> #7 0x00007f730e6ce2ef in rpc_clnt_notify (trans=<optimized >> out>, >> >> >>> >> mydata=0x7f72fc04c070, event=<optimized out>, >> data=0x7f72fc079560) >> >> >>> >> at rpc-clnt.c:967 >> >> >>> >> #8 0x00007f730e6ca483 in rpc_transport_notify >> >> >>> >> (this=this@entry=0x7f72fc05bd30, >> >> >>> >> event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, >> >> >>> >> data=data@entry=0x7f72fc079560) at rpc-transport.c:546 >> >> >>> >> #9 0x00007f73034dc344 in socket_event_poll_in >> >> >>> >> (this=this@entry=0x7f72fc05bd30) at socket.c:2250 >> >> >>> >> #10 0x00007f73034def44 in socket_event_handler (fd=fd@entry=10, >> >> >>> >> idx=idx@entry=2, data=0x7f72fc05bd30, poll_in=1, poll_out=0, >> >> >>> >> poll_err=0) >> >> >>> >> at socket.c:2363 >> >> >>> >> #11 0x00007f730e96c5aa in event_dispatch_epoll_handler >> >> >>> >> (event=0x7f730180ced0, event_pool=0xf42ee0) at event-epoll.c:575 >> >> >>> >> #12 event_dispatch_epoll_worker (data=0xf8d650) at >> >> >>> >> event-epoll.c:678 >> >> >>> >> #13 0x00007f730d96ddc5 in start_thread () from >> >> >>> >> /lib64/libpthread.so.0 >> >> >>> >> #14 0x00007f730d2b2ced in clone () from /lib64/libc.so.6 >> >> >>> >> >> >> >>> >> It seems like there is some situation where the structure is not >> >> >>> >> intialized properly? Appreciate if anyone can advice. Thanks. >> >> >>> >> >> >> >>> >> Cw >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> On Wed, Dec 7, 2016 at 9:42 AM, qingwei wei < >> [email protected]> >> >> >>> >> wrote: >> >> >>> >> > Hi, >> >> >>> >> > >> >> >>> >> > I did another test and this time FIO fails with >> >> >>> >> > >> >> >>> >> > fio: io_u error on file /mnt/testSF-HDD1/test: Invalid >> argument: >> >> >>> >> > write >> >> >>> >> > offset=114423242752, buflen=8192 >> >> >>> >> > fio: pid=10052, err=22/file:io_u.c:1582, func=io_u error, >> >> >>> >> > error=Invalid >> >> >>> >> > argument >> >> >>> >> > >> >> >>> >> > test: (groupid=0, jobs=1): err=22 (file:io_u.c:1582, func=io_u >> >> >>> >> > error, >> >> >>> >> > error=Invalid argument): pid=10052: Tue Dec 6 15:18:47 2016 >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > Below is the client log: >> >> >>> >> > >> >> >>> >> > [2016-12-06 05:19:31.261289] I >> >> >>> >> > [fuse-bridge.c:5171:fuse_graph_setup] >> >> >>> >> > 0-fuse: switched to graph 0 >> >> >>> >> > [2016-12-06 05:19:31.261355] I [MSGID: 114035] >> >> >>> >> > [client-handshake.c:193:client_set_lk_version_cbk] >> >> >>> >> > 0-testSF-HDD-client-5: Server lk version = 1 >> >> >>> >> > [2016-12-06 05:19:31.261404] I [fuse-bridge.c:4083:fuse_init] >> >> >>> >> > 0-glusterfs-fuse: FUSE inited with protocol versions: >> glusterfs >> >> >>> >> > 7.22 >> >> >>> >> > kernel 7.22 >> >> >>> >> > [2016-12-06 05:19:31.262901] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-0: >> >> >>> >> > selecting local read_child testSF-HDD-client-1 >> >> >>> >> > [2016-12-06 05:19:31.262930] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-0: >> >> >>> >> > selecting local read_child testSF-HDD-client-0 >> >> >>> >> > [2016-12-06 05:19:31.262948] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-0: >> >> >>> >> > selecting local read_child testSF-HDD-client-2 >> >> >>> >> > [2016-12-06 05:19:31.269592] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-1: >> >> >>> >> > selecting local read_child testSF-HDD-client-3 >> >> >>> >> > [2016-12-06 05:19:31.269795] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-1: >> >> >>> >> > selecting local read_child testSF-HDD-client-4 >> >> >>> >> > [2016-12-06 05:19:31.277763] I [MSGID: 108031] >> >> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk] >> >> >>> >> > 0-testSF-HDD-replicate-1: >> >> >>> >> > selecting local read_child testSF-HDD-client-5 >> >> >>> >> > [2016-12-06 06:58:05.399244] W [MSGID: 101159] >> >> >>> >> > [inode.c:1219:__inode_unlink] 0-inode: >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > be318638-e8a0-4c6d-977d-7a937aa84806/864c9ea1-3a7e-4d41- >> 899b-f30604a7584e.16284: >> >> >>> >> > dentry not found in 63af10b7-9dac-4a53-aab1-3cc17fff3255 >> >> >>> >> > [2016-12-06 15:17:43.311400] E >> >> >>> >> > [shard.c:460:__shard_update_shards_inode_list] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_common_lookup_shards_cbk+0x2d) >> >> >>> >> > [0x7f5575680fdd] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shar >> d_link_block_inode+0xdf) >> >> >>> >> > [0x7f5575680f6f] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh >> ard_update_shards_inode_list+0x22e) >> >> >>> >> > [0x7f557567c1ce] ) 0-: Assertion failed: >> lru_inode_ctx->block_num >> >> >>> >> > > 0 >> >> >>> >> > [2016-12-06 15:17:43.311472] W [inode.c:1232:inode_unlink] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_link_block_inode+0xdf) >> >> >>> >> > [0x7f5575680f6f] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh >> ard_update_shards_inode_list+0x14a) >> >> >>> >> > [0x7f557567c0ea] -->/lib64/libglusterfs.so.0(in >> ode_unlink+0x9c) >> >> >>> >> > [0x7f558386ba0c] ) 0-testSF-HDD-shard: inode not found >> >> >>> >> > [2016-12-06 15:17:43.333456] W [inode.c:1133:inode_forget] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_link_block_inode+0xdf) >> >> >>> >> > [0x7f5575680f6f] >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh >> ard_update_shards_inode_list+0x154) >> >> >>> >> > [0x7f557567c0f4] -->/lib64/libglusterfs.so.0(in >> ode_forget+0x90) >> >> >>> >> > [0x7f558386b800] ) 0-testSF-HDD-shard: inode not found >> >> >>> >> > [2016-12-06 15:18:47.129794] W >> >> >>> >> > [fuse-bridge.c:2311:fuse_writev_cbk] >> >> >>> >> > 0-glusterfs-fuse: 12555429: WRITE => -1 >> >> >>> >> > gfid=864c9ea1-3a7e-4d41-899b-f30604a7584e fd=0x7f557016ae6c >> >> >>> >> > (Invalid >> >> >>> >> > argument) >> >> >>> >> > >> >> >>> >> > Below is the code and it will go to the else block when >> >> >>> >> > inode_count >> >> >>> >> > is >> >> >>> >> > greater than SHARD_MAX_INODES which is 16384. And my dataset >> of >> >> >>> >> > 400GB >> >> >>> >> > with 16MB shard size has enough shard file (400GB/16MB) to >> >> >>> >> > achieve >> >> >>> >> > it. >> >> >>> >> > When i do the test with smaller dataset, there is no such >> error. >> >> >>> >> > >> >> >>> >> > shard.c >> >> >>> >> > >> >> >>> >> > if (priv->inode_count + 1 <= >> SHARD_MAX_INODES) { >> >> >>> >> > /* If this inode was linked here for the first >> >> >>> >> > time >> >> >>> >> > (indicated >> >> >>> >> > * by empty list), and if there is still >> space in >> >> >>> >> > the >> >> >>> >> > priv list, >> >> >>> >> > * add this ctx to the tail of the list. >> >> >>> >> > */ >> >> >>> >> > gf_uuid_copy (ctx->base_gfid, >> >> >>> >> > base_inode->gfid); >> >> >>> >> > ctx->block_num = block_num; >> >> >>> >> > list_add_tail (&ctx->ilist, >> >> >>> >> > &priv->ilist_head); >> >> >>> >> > priv->inode_count++; >> >> >>> >> > } else { >> >> >>> >> > /*If on the other hand there is no available >> slot >> >> >>> >> > for >> >> >>> >> > this inode >> >> >>> >> > * in the list, delete the lru inode from the >> >> >>> >> > head of >> >> >>> >> > the list, >> >> >>> >> > * unlink it. And in its place add this new >> inode >> >> >>> >> > into >> >> >>> >> > the list. >> >> >>> >> > */ >> >> >>> >> > lru_inode_ctx = list_first_entry >> >> >>> >> > (&priv->ilist_head, >> >> >>> >> > >> >> >>> >> > shard_inode_ctx_t, >> >> >>> >> > >> ilist); >> >> >>> >> > /* add in message for debug*/ >> >> >>> >> > gf_msg (THIS->name, GF_LOG_WARNING, 0, >> >> >>> >> > SHARD_MSG_INVALID_FOP, >> >> >>> >> > "block number = %d", >> >> >>> >> > lru_inode_ctx->block_num); >> >> >>> >> > >> >> >>> >> > GF_ASSERT (lru_inode_ctx->block_num > >> 0); >> >> >>> >> > >> >> >>> >> > Hopefully can get some advice from you guys on this. Thanks. >> >> >>> >> > >> >> >>> >> > Cw >> >> >>> >> > >> >> >>> >> > On Tue, Dec 6, 2016 at 9:07 AM, qingwei wei < >> [email protected]> >> >> >>> >> > wrote: >> >> >>> >> >> Hi, >> >> >>> >> >> >> >> >>> >> >> This is the repost of my email in the gluster-user mailing >> list. >> >> >>> >> >> Appreciate if anyone has any idea on the issue i have now. >> >> >>> >> >> Thanks. >> >> >>> >> >> >> >> >>> >> >> I encountered this when i do the FIO random write on the fuse >> >> >>> >> >> mount >> >> >>> >> >> gluster volume. After this assertion happen, the client log >> is >> >> >>> >> >> filled >> >> >>> >> >> with pending frames messages and FIO just show zero IO in the >> >> >>> >> >> progress >> >> >>> >> >> status. As i leave this test to run overnight, the client log >> >> >>> >> >> file >> >> >>> >> >> fill up with those pending frame messages and hit 28GB for >> >> >>> >> >> around 12 >> >> >>> >> >> hours. >> >> >>> >> >> >> >> >>> >> >> The client log: >> >> >>> >> >> >> >> >>> >> >> [2016-12-04 15:48:35.274208] W [MSGID: 109072] >> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: >> got >> >> >>> >> >> non-linkfile >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf >> 7ed.7038, >> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000 >> >> >>> >> >> [2016-12-04 15:48:35.277208] W [MSGID: 109072] >> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: >> got >> >> >>> >> >> non-linkfile >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf >> 7ed.8957, >> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000 >> >> >>> >> >> [2016-12-04 15:48:35.277588] W [MSGID: 109072] >> >> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: >> got >> >> >>> >> >> non-linkfile >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf >> 7ed.11912, >> >> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000 >> >> >>> >> >> [2016-12-04 15:48:35.312751] E >> >> >>> >> >> [shard.c:460:__shard_update_shards_inode_list] >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha >> rd_common_lookup_shards_cbk+0x2d) >> >> >>> >> >> [0x7f86cc42efdd] >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shar >> d_link_block_inode+0xdf) >> >> >>> >> >> [0x7f86cc42ef6f] >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh >> ard_update_shards_inode_list+0x22e) >> >> >>> >> >> [0x7f86cc42a1ce] ) 0-: Assertion failed: >> >> >>> >> >> lru_inode_ctx->block_num > >> >> >>> >> >> 0 >> >> >>> >> >> pending frames: >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> frame : type(0) op(0) >> >> >>> >> >> >> >> >>> >> >> Gluster info (i am testing this on one server with each disk >> >> >>> >> >> representing one brick, this gluster volume is then mounted >> >> >>> >> >> locally >> >> >>> >> >> via fuse) >> >> >>> >> >> >> >> >>> >> >> Volume Name: testSF >> >> >>> >> >> Type: Distributed-Replicate >> >> >>> >> >> Volume ID: 3f205363-5029-40d7-b1b5-216f9639b454 >> >> >>> >> >> Status: Started >> >> >>> >> >> Number of Bricks: 2 x 3 = 6 >> >> >>> >> >> Transport-type: tcp >> >> >>> >> >> Bricks: >> >> >>> >> >> Brick1: 192.168.123.4:/mnt/sdb_mssd/testSF >> >> >>> >> >> Brick2: 192.168.123.4:/mnt/sdc_mssd/testSF >> >> >>> >> >> Brick3: 192.168.123.4:/mnt/sdd_mssd/testSF >> >> >>> >> >> Brick4: 192.168.123.4:/mnt/sde_mssd/testSF >> >> >>> >> >> Brick5: 192.168.123.4:/mnt/sdf_mssd/testSF >> >> >>> >> >> Brick6: 192.168.123.4:/mnt/sdg_mssd/testSF >> >> >>> >> >> Options Reconfigured: >> >> >>> >> >> features.shard-block-size: 16MB >> >> >>> >> >> features.shard: on >> >> >>> >> >> performance.readdir-ahead: on >> >> >>> >> >> >> >> >>> >> >> Gluster version: 3.7.17 >> >> >>> >> >> >> >> >>> >> >> The actual disk usage (Is about 91% full): >> >> >>> >> >> >> >> >>> >> >> /dev/sdb1 235G 202G 22G 91% /mnt/sdb_mssd >> >> >>> >> >> /dev/sdc1 235G 202G 22G 91% /mnt/sdc_mssd >> >> >>> >> >> /dev/sdd1 235G 202G 22G 91% /mnt/sdd_mssd >> >> >>> >> >> /dev/sde1 235G 200G 23G 90% /mnt/sde_mssd >> >> >>> >> >> /dev/sdf1 235G 200G 23G 90% /mnt/sdf_mssd >> >> >>> >> >> /dev/sdg1 235G 200G 23G 90% /mnt/sdg_mssd >> >> >>> >> >> >> >> >>> >> >> Anyone encounter this issue before? >> >> >>> >> >> >> >> >>> >> >> Cw >> >> >>> >> _______________________________________________ >> >> >>> >> Gluster-devel mailing list >> >> >>> >> [email protected] >> >> >>> >> http://www.gluster.org/mailman/listinfo/gluster-devel >> >> >>> > >> >> >>> > >> >> >> >> >> >> >> > >> > >> > >
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
