Re: [Gluster-users] Glusterd seems to be ignoring that the underling filesystem got missing

Joe Julian Fri, 23 Sep 2016 09:31:58 -0700

So my question is, how did you get from each of the bricks being killed,"Sep 22 19:57:32 glu03 bricks-vol-video-asset-manager-brick1[45162]:[2016-09-22 17:57:32.714461] M [MSGID: 113075][posix-helpers.c:1850:posix_health_check_thread_proc]0-vol-video-asset-manager-posix: still alive! -> SIGTERM", to havingthem running again?


Maybe there's a clue in the brick logs, have you looked in those?


On 09/23/2016 09:06 AM, Luca Gervasi wrote:

Hi guys,
I've got a strange problem involving this timeline (matches the "Logfragment 1" excerpt)19:56:50: disk is detached from my system. This disk is actually thebrick of the volume V.19:56:50: LVM sees the disk as unreachable and starts its maintenanceprocedures
19:56:50: LVM umounts my thin provisioned volumes
19:57:02: Health check on specific bricks fails thus moving the brickto a down state
19:57:32: XFS filesystem umounts
At this point, the brick filesystem is no longer mounted. Theunderlying filesystems is empty (misses the brick directory too). Myassumption is that gluster would stop itself in such conditions: it isnot.
Gluster slowly fills my entire root partition, creating its full tree.

My only warning point is the disk that starts to fill its inodes to 100%.
I've read release notes for every version subsequent mine (3.7.14,3.7.15) without finding relevant fixes and at this point i'm prettysure is some bug undocumented.
Servers were made symmetric.
Could you please help me understand how to avoid that gluster coninueswrite on an unmounted filesystem? Thanks.
I'm running a 3 node replica on 3 azure vms. This is the configuration:

MD (yes, i use md to aggregate 4 disks into a single 4Tb volume):
/dev/md128:
        Version : 1.2
  Creation Time : Mon Aug 29 18:10:45 2016
     Raid Level : raid0
     Array Size : 4290248704 (4091.50 GiB 4393.21 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon Aug 29 18:10:45 2016
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : 128
           UUID : d5c51214:43e48da9:49086616:c1371514
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       80        0      active sync /dev/sdf
       1       8       96        1      active sync /dev/sdg
       2       8      112        2      active sync /dev/sdh
       3       8      128        3      active sync /dev/sdi

PV, VG, LV status
  PV         VG      Fmt  Attr PSize PFree DevSize PV UUID
/dev/md127 VGdata lvm2 a-- 2.00t 2.00t 2.00tKxb6C0-FLIH-4rB1-DKyf-IQuR-bbPE-jm2mu0/dev/md128 gluster lvm2 a-- 4.00t 1.07t 4.00tlDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34mVG Attr Ext #PV #LV #SN VSize VFree VG UUIDVProfileVGdata wz--n- 4.00m 1 0 0 2.00t 2.00tXI2V2X-hdxU-0Jrn-TN7f-GSEk-7aNs-GCdTtngluster wz--n- 4.00m 1 6 0 4.00t 1.07tztxX4f-vTgN-IKop-XePU-OwqW-T9k6-A6uDk0
LV VG #Seg Attr LSize Maj Min KMaj KMinPool Origin Data% Meta% Move Cpy%Sync Log Convert LV UUIDLProfileapps-data gluster 1 Vwi-aotz-- 50.00g -1 -1 25312 thinpool 0.08 znUMbm-ax1N-R7aj-dxLc-gtif-WOvk-9QC8tqfeed gluster 1 Vwi-aotz-- 100.00g -1 -1 25314 thinpool 0.08 hZ4Isk-dELG-lgFs-2hJ6-aYid-8VKg-3jJko9homes gluster 1 Vwi-aotz-- 1.46t -1 -1 25311 thinpool 58.58 salIPF-XvsA-kMnm-etjf-Uaqy-2vA9-9WHPkHsearch-data gluster 1 Vwi-aotz-- 100.00g -1 -1 25313 thinpool 16.41 Z5hoa3-yI8D-dk5Q-2jWH-N5R2-ge09-RSjPpQthinpool gluster 1 twi-aotz-- 2.93t -1 -1 2539 29.85 60.00oHTbgW-tiPh-yDfj-dNOm-vqsF-fBNH-o1izx2video-asset-manager gluster 1 Vwi-aotz-- 100.00g -1 -1 25315 thinpool 0.07 4dOXga-96Wa-u3mh-HMmE-iX1I-o7ov-dtJ8lZ
Gluster volume configuration (all volumes use the same exactconfiguration, listing them all would be redundant)
Volume Name: vol-homes
Type: Replicate
Volume ID: 0c8fa62e-dd7e-429c-a19a-479404b5e9c6
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: glu01.prd.azr:/bricks/vol-homes/brick1
Brick2: glu02.prd.azr:/bricks/vol-homes/brick1
Brick3: glu03.prd.azr:/bricks/vol-homes/brick1
Options Reconfigured:
performance.readdir-ahead: on
cluster.server-quorum-type: server
nfs.disable: disable
cluster.lookup-unhashed: auto
performance.nfs.quick-read: on
performance.nfs.read-ahead: on
performance.cache-size: 4096MB
cluster.self-heal-daemon: enable
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
nfs.rpc-auth-unix: off
nfs.acl: off
performance.nfs.io-cache: on
performance.client-io-threads: on
performance.nfs.stat-prefetch: on
performance.nfs.io-threads: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.md-cache-timeout: 1
performance.cache-refresh-timeout: 1
performance.io-thread-count: 16
performance.high-prio-threads: 16
performance.normal-prio-threads: 16
performance.low-prio-threads: 16
performance.least-prio-threads: 1
cluster.server-quorum-ratio: 60

fstab:
/dev/gluster/homes /bricks/vol-homesxfs defaults,noatime,nobarrier,nofail 0 2
Software:
CentOS Linux release 7.1.1503 (Core)
glusterfs-api-3.7.13-1.el7.x86_64
glusterfs-libs-3.7.13-1.el7.x86_64
glusterfs-3.7.13-1.el7.x86_64
glusterfs-fuse-3.7.13-1.el7.x86_64
glusterfs-server-3.7.13-1.el7.x86_64
glusterfs-client-xlators-3.7.13-1.el7.x86_64
glusterfs-cli-3.7.13-1.el7.x86_64


Log fragment 1:
Sep 22 19:56:50 glu03 lvm[868]: WARNING: Device for PVlDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m not found or rejected by a filter.Sep 22 19:56:50 glu03 lvm[868]: Cannot change VG gluster while PVs aremissing.
Sep 22 19:56:50 glu03 lvm[868]: Consider vgreduce --removemissing.
Sep 22 19:56:50 glu03 lvm[868]: Failed to extend thin metadatagluster-thinpool-tpool.Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volumegluster-thinpool-tpool from /bricks/vol-homes.Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volumegluster-thinpool-tpool from /bricks/vol-search-data.Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volumegluster-thinpool-tpool from /bricks/vol-apps-data.Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volumegluster-thinpool-tpool from /bricks/vol-video-asset-manager.Sep 22 19:57:02 glu03 bricks-vol-video-asset-manager-brick1[45162]:[2016-09-22 17:57:02.713428] M [MSGID: 113075][posix-helpers.c:1844:posix_health_check_thread_proc]0-vol-video-asset-manager-posix: health-check failed, going downSep 22 19:57:05 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-2217:57:05.186146] M [MSGID: 113075][posix-helpers.c:1844:posix_health_check_thread_proc]0-vol-apps-data-posix: health-check failed, going downSep 22 19:57:18 glu03 bricks-vol-search-data-brick1[40928]:[2016-09-22 17:57:18.674279] M [MSGID: 113075][posix-helpers.c:1844:posix_health_check_thread_proc]0-vol-search-data-posix: health-check failed, going downSep 22 19:57:32 glu03 bricks-vol-video-asset-manager-brick1[45162]:[2016-09-22 17:57:32.714461] M [MSGID: 113075][posix-helpers.c:1850:posix_health_check_thread_proc]0-vol-video-asset-manager-posix: still alive! -> SIGTERM
Sep 22 19:57:32 glu03 kernel: XFS (dm-15): Unmounting Filesystem
Sep 22 19:57:35 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-2217:57:35.186352] M [MSGID: 113075][posix-helpers.c:1850:posix_health_check_thread_proc]0-vol-apps-data-posix: still alive! -> SIGTERM
Sep 22 19:57:35 glu03 kernel: XFS (dm-12): Unmounting Filesystem
Sep 22 19:57:48 glu03 bricks-vol-search-data-brick1[40928]:[2016-09-22 17:57:48.674444] M [MSGID: 113075][posix-helpers.c:1850:posix_health_check_thread_proc]0-vol-search-data-posix: still alive! -> SIGTERM
Sep 22 19:57:48 glu03 kernel: XFS (dm-13): Unmounting Filesystem


_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd seems to be ignoring that the underling filesystem got missing

Reply via email to