On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <[email protected]> wrote:
> Hi Pranith, > > if I restart glusterd service in the node alone will it work. bcoz I feel > that doing volume force start will trigger bitrot process to crawl disks in > all nodes. > Have you enabled bitrot? If not then the process will not be in existence. As a workaround you can always disable this option before executing volume start force. Please note volume start force doesn't affect any running processes. > > yes, rebalance fix layout is on process. > > regards > Amudhan > > > On Tue, Apr 25, 2017 at 9:15 PM, Pranith Kumar Karampuri < > [email protected]> wrote: > >> You can restart the process using: >> gluster volume start <volname> force >> >> Did shd on this node heal a lot of data? Based on the kind of memory >> usage it showed, seems like there is a leak. >> >> >> Sunil, >> Could you find if there any leaks in this particular version that >> we might have missed in our testing? >> >> On Tue, Apr 25, 2017 at 8:37 PM, Amudhan P <[email protected]> wrote: >> >>> Hi, >>> >>> In one of my node glustershd process is killed due to OOM and this >>> happened only in one node out of 40 node cluster. >>> >>> Node running on Ubuntu 16.04.2. >>> >>> dmesg output: >>> >>> [Mon Apr 24 17:21:38 2017] nrpe invoked oom-killer: gfp_mask=0x26000c0, >>> order=2, oom_score_adj=0 >>> [Mon Apr 24 17:21:38 2017] nrpe cpuset=/ mems_allowed=0 >>> [Mon Apr 24 17:21:38 2017] CPU: 0 PID: 12626 Comm: nrpe Not tainted >>> 4.4.0-62-generic #83-Ubuntu >>> [Mon Apr 24 17:21:38 2017] 0000000000000286 00000000fc26b170 >>> ffff88048bf27af0 ffffffff813f7c63 >>> [Mon Apr 24 17:21:38 2017] ffff88048bf27cc8 ffff88082a663c00 >>> ffff88048bf27b60 ffffffff8120ad4e >>> [Mon Apr 24 17:21:38 2017] ffff88087781a870 ffff88087781a860 >>> ffffea0011285a80 0000000100000001 >>> [Mon Apr 24 17:21:38 2017] Call Trace: >>> [Mon Apr 24 17:21:38 2017] [<ffffffff813f7c63>] dump_stack+0x63/0x90 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff8120ad4e>] dump_header+0x5a/0x1c5 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff811926c2>] >>> oom_kill_process+0x202/0x3c0 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81192ae9>] >>> out_of_memory+0x219/0x460 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198a5d>] >>> __alloc_pages_slowpath.constprop.88+0x8fd/0xa70 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198e56>] >>> __alloc_pages_nodemask+0x286/0x2a0 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198f0b>] >>> alloc_kmem_pages_node+0x4b/0xc0 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff8107ea5e>] >>> copy_process+0x1be/0x1b70 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff8122d013>] ? __fd_install+0x33/0xe0 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81713d01>] ? >>> release_sock+0x111/0x160 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff810805a0>] _do_fork+0x80/0x360 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff8122429c>] ? SyS_select+0xcc/0x110 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff81080929>] SyS_clone+0x19/0x20 >>> [Mon Apr 24 17:21:38 2017] [<ffffffff818385f2>] >>> entry_SYSCALL_64_fastpath+0x16/0x71 >>> [Mon Apr 24 17:21:38 2017] Mem-Info: >>> [Mon Apr 24 17:21:38 2017] active_anon:553952 inactive_anon:206987 >>> isolated_anon:0 >>> active_file:3410764 inactive_file:3460179 >>> isolated_file:0 >>> unevictable:4914 dirty:212868 writeback:0 >>> unstable:0 >>> slab_reclaimable:386621 >>> slab_unreclaimable:31829 >>> mapped:6112 shmem:211 pagetables:6178 >>> bounce:0 >>> free:82623 free_pcp:213 free_cma:0 >>> [Mon Apr 24 17:21:38 2017] Node 0 DMA free:15880kB min:32kB low:40kB >>> high:48kB active_anon:0kB inactive_anon:0k >>> B active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB >>> isolated(file):0kB present:15964kB manag >>> ed:15880kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB >>> slab_reclaimable:0kB slab_unreclaimable:0kB >>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB >>> local_pcp:0kB free_cma:0kB writeback_tmp: >>> 0kB pages_scanned:0 all_unreclaimable? yes >>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 1868 31944 31944 31944 >>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32 free:133096kB min:3948kB >>> low:4932kB high:5920kB active_anon:170764kB in >>> active_anon:206296kB active_file:394236kB inactive_file:525288kB >>> unevictable:980kB isolated(anon):0kB isolated( >>> file):0kB present:2033596kB managed:1952976kB mlocked:980kB dirty:1552kB >>> writeback:0kB mapped:3904kB shmem:724k >>> B slab_reclaimable:502176kB slab_unreclaimable:8916kB >>> kernel_stack:1952kB pagetables:1408kB unstable:0kB bounce >>> :0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB >>> pages_scanned:0 all_unreclaimable? no >>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 30076 30076 30076 >>> [Mon Apr 24 17:21:38 2017] Node 0 Normal free:181516kB min:63600kB >>> low:79500kB high:95400kB active_anon:2045044 >>> kB inactive_anon:621652kB active_file:13248820kB >>> inactive_file:13315428kB unevictable:18676kB isolated(anon):0kB >>> isolated(file):0kB present:31322112kB managed:30798036kB mlocked:18676kB >>> dirty:849920kB writeback:0kB mapped:20544kB shmem:120kB >>> slab_reclaimable:1044308kB slab_unreclaimable:118400kB kernel_stack:33792kB >>> pagetables:23304kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB >>> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 0 0 0 >>> [Mon Apr 24 17:21:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB >>> 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB >>> 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB >>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32: 18416*4kB (UME) 7480*8kB (UME) >>> 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0* >>> 512kB 0*1024kB 0*2048kB 0*4096kB = 133504kB >>> [Mon Apr 24 17:21:38 2017] Node 0 Normal: 44972*4kB (UMEH) 13*8kB (EH) >>> 13*16kB (H) 13*32kB (H) 8*64kB (H) 2*128 >>> kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 181384kB >>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0 >>> hugepages_surp=0 hugepages_size=1048576kB >>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0 >>> hugepages_surp=0 hugepages_size=2048kB >>> [Mon Apr 24 17:21:38 2017] 6878703 total pagecache pages >>> [Mon Apr 24 17:21:38 2017] 2484 pages in swap cache >>> [Mon Apr 24 17:21:38 2017] Swap cache stats: add 3533870, delete >>> 3531386, find 3743168/4627884 >>> [Mon Apr 24 17:21:38 2017] Free swap = 14976740kB >>> [Mon Apr 24 17:21:38 2017] Total swap = 15623164kB >>> [Mon Apr 24 17:21:38 2017] 8342918 pages RAM >>> [Mon Apr 24 17:21:38 2017] 0 pages HighMem/MovableOnly >>> [Mon Apr 24 17:21:38 2017] 151195 pages reserved >>> [Mon Apr 24 17:21:38 2017] 0 pages cma reserved >>> [Mon Apr 24 17:21:38 2017] 0 pages hwpoisoned >>> [Mon Apr 24 17:21:38 2017] [ pid ] uid tgid total_vm rss nr_ptes >>> nr_pmds swapents oom_score_adj name >>> [Mon Apr 24 17:21:38 2017] [ 566] 0 566 15064 460 33 >>> 3 1108 0 systemd >>> -journal >>> [Mon Apr 24 17:21:38 2017] [ 602] 0 602 23693 182 16 >>> 3 0 0 lvmetad >>> [Mon Apr 24 17:21:38 2017] [ 613] 0 613 11241 589 21 >>> 3 264 -1000 systemd >>> -udevd >>> [Mon Apr 24 17:21:38 2017] [ 1381] 100 1381 25081 440 19 >>> 3 25 0 systemd >>> -timesyn >>> [Mon Apr 24 17:21:38 2017] [ 1447] 0 1447 1100 307 7 >>> 3 0 0 acpid >>> [Mon Apr 24 17:21:38 2017] [ 1449] 0 1449 7252 374 21 >>> 3 47 0 cron >>> >>> [Mon Apr 24 17:21:38 2017] [ 1451] 0 1451 77253 994 19 >>> 3 10 0 lxcfs >>> [Mon Apr 24 17:21:38 2017] [ 1483] 0 1483 6511 413 18 >>> 3 42 0 atd >>> [Mon Apr 24 17:21:38 2017] [ 1505] 0 1505 7157 286 18 >>> 3 36 0 systemd >>> -logind >>> [Mon Apr 24 17:21:38 2017] [ 1508] 104 1508 64099 376 27 >>> 4 712 0 rsyslog >>> d >>> [Mon Apr 24 17:21:38 2017] [ 1510] 107 1510 10723 497 25 >>> 3 45 -900 dbus-da >>> emon >>> [Mon Apr 24 17:21:38 2017] [ 1521] 0 1521 68970 178 38 >>> 3 170 0 account >>> s-daemon >>> [Mon Apr 24 17:21:38 2017] [ 1526] 0 1526 6548 785 16 >>> 3 63 0 smartd >>> [Mon Apr 24 17:21:38 2017] [ 1528] 0 1528 54412 146 31 >>> 5 1806 0 snapd >>> [Mon Apr 24 17:21:38 2017] [ 1578] 0 1578 3416 335 11 >>> 3 24 0 mdadm >>> [Mon Apr 24 17:21:38 2017] [ 1595] 0 1595 16380 470 35 >>> 3 157 -1000 sshd >>> [Mon Apr 24 17:21:38 2017] [ 1610] 0 1610 69295 303 40 >>> 4 57 0 polkitd >>> [Mon Apr 24 17:21:38 2017] [ 1618] 0 1618 1306 31 8 >>> 3 0 0 iscsid >>> [Mon Apr 24 17:21:38 2017] [ 1619] 0 1619 1431 877 8 >>> 3 0 -17 iscsid >>> [Mon Apr 24 17:21:38 2017] [ 1624] 0 1624 126363 8027 122 >>> 4 22441 0 gluster >>> d >>> [Mon Apr 24 17:21:38 2017] [ 1688] 0 1688 4884 430 15 >>> 3 46 0 irqbala >>> nce >>> [Mon Apr 24 17:21:38 2017] [ 1699] 0 1699 3985 348 13 >>> 3 0 0 agetty >>> [Mon Apr 24 17:21:38 2017] [ 7001] 0 7001 500631 27874 145 >>> 5 3356 0 gluster >>> fsd >>> [Mon Apr 24 17:21:38 2017] [ 8136] 0 8136 500631 28760 141 >>> 5 2390 0 gluster >>> fsd >>> [Mon Apr 24 17:21:38 2017] [ 9280] 0 9280 533529 27752 135 >>> 5 3200 0 gluster >>> fsd >>> [Mon Apr 24 17:21:38 2017] [12626] 111 12626 5991 420 16 >>> 3 113 0 nrpe >>> [Mon Apr 24 17:21:38 2017] [14342] 0 14342 533529 28377 135 >>> 5 2176 0 gluster >>> fsd >>> [Mon Apr 24 17:21:38 2017] [14361] 0 14361 534063 29190 136 >>> 5 1972 0 gluster >>> fsd >>> >>> [Mon Apr 24 17:21:38 2017] [14380] 0 14380 533529 28104 136 >>> 6 2437 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14399] 0 14399 533529 27552 131 >>> 5 2808 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14418] 0 14418 533529 29588 138 >>> 5 2697 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14437] 0 14437 517080 28671 146 >>> 5 2170 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14456] 0 14456 533529 28083 139 >>> 5 3359 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14475] 0 14475 533529 28054 134 >>> 5 2954 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14494] 0 14494 533529 28594 135 >>> 5 2311 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14513] 0 14513 533529 28911 138 >>> 5 2833 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14532] 0 14532 533529 28259 134 >>> 6 3145 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14551] 0 14551 533529 27875 138 >>> 5 2267 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [14570] 0 14570 484716 28247 142 >>> 5 2875 0 glusterfsd >>> [Mon Apr 24 17:21:38 2017] [27646] 0 27646 3697561 202086 2830 >>> 17 16528 0 glusterfs >>> [Mon Apr 24 17:21:38 2017] [27655] 0 27655 787371 29588 197 >>> 6 25472 0 glusterfs >>> [Mon Apr 24 17:21:38 2017] [27665] 0 27665 689585 605 108 >>> 6 7008 0 glusterfs >>> [Mon Apr 24 17:21:38 2017] [29878] 0 29878 193833 36054 241 >>> 4 41182 0 glusterfs >>> [Mon Apr 24 17:21:38 2017] Out of memory: Kill process 27646 (glusterfs) >>> score 17 or sacrifice child >>> [Mon Apr 24 17:21:38 2017] Killed process 27646 (glusterfs) >>> total-vm:14790244kB, anon-rss:795040kB, file-rss:13304kB >>> >>> /var/log/glusterfs/glusterd.log >>> [2017-04-24 11:53:51.359603] I [MSGID: 106006] >>> [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management: >>> glustershd has disconnected from glusterd. >>> >>> what would have gone wrong? >>> >>> regards >>> Amudhan >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Pranith >> > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
