Filed a bug report. I was not able to reproduce the issue on x86 hardware. https://bugzilla.redhat.com/show_bug.cgi?id=1811373
On Mon, Mar 2, 2020 at 1:58 AM Strahil Nikolov <hunter86...@yahoo.com> wrote: > On March 2, 2020 3:29:06 AM GMT+02:00, Fox <foxxz....@gmail.com> wrote: > >The brick is mounted. However glusterfsd crashes shortly after startup. > >This happens on any host that needs to heal a dispersed volume. > > > >I spent today doing a clean rebuild of the cluster. Clean install of > >ubuntu > >18 and gluster 7.2. Create a dispersed volume. Reboot one of the > >cluster > >members while the volume is up and online. When that cluster member > >comes > >back it can not heal. > > > >I was able to replicate this behavior with raspberry pis running > >raspbian > >and gluster 5 so it looks like its not limited to the specific hardware > >or > >version of gluster I'm using but perhaps the ARM architecture as a > >whole. > > > >Thank you for your help. Aside from not using dispersed volumes I don't > >think there is much more I can do. Submit a bug report I guess :) > > > > > > > > > > > >On Sun, Mar 1, 2020 at 12:02 PM Strahil Nikolov <hunter86...@yahoo.com> > >wrote: > > > >> On March 1, 2020 6:22:59 PM GMT+02:00, Fox <foxxz....@gmail.com> > >wrote: > >> >Yes the brick was up and running. And I can see files on the brick > >> >created > >> >by connected clients up until the node was rebooted. > >> > > >> >This is what the volume status looks like after gluster12 was > >rebooted. > >> >Prior to reboot it showed as online and was otherwise operational. > >> > > >> >root@gluster01:~# gluster volume status > >> >Status of volume: disp1 > >> >Gluster process TCP Port RDMA Port > >Online > >> > Pid > >> > >> > > >>------------------------------------------------------------------------------ > >> >Brick gluster01:/exports/sda/brick1/disp1 49152 0 Y > >> >3931 > >> >Brick gluster02:/exports/sda/brick1/disp1 49152 0 Y > >> >2755 > >> >Brick gluster03:/exports/sda/brick1/disp1 49152 0 Y > >> >2787 > >> >Brick gluster04:/exports/sda/brick1/disp1 49152 0 Y > >> >2780 > >> >Brick gluster05:/exports/sda/brick1/disp1 49152 0 Y > >> >2764 > >> >Brick gluster06:/exports/sda/brick1/disp1 49152 0 Y > >> >2760 > >> >Brick gluster07:/exports/sda/brick1/disp1 49152 0 Y > >> >2740 > >> >Brick gluster08:/exports/sda/brick1/disp1 49152 0 Y > >> >2729 > >> >Brick gluster09:/exports/sda/brick1/disp1 49152 0 Y > >> >2772 > >> >Brick gluster10:/exports/sda/brick1/disp1 49152 0 Y > >> >2791 > >> >Brick gluster11:/exports/sda/brick1/disp1 49152 0 Y > >> >2026 > >> >Brick gluster12:/exports/sda/brick1/disp1 N/A N/A N > >> >N/A > >> >Self-heal Daemon on localhost N/A N/A Y > >> >3952 > >> >Self-heal Daemon on gluster03 N/A N/A Y > >> >2808 > >> >Self-heal Daemon on gluster02 N/A N/A Y > >> >2776 > >> >Self-heal Daemon on gluster06 N/A N/A Y > >> >2781 > >> >Self-heal Daemon on gluster07 N/A N/A Y > >> >2761 > >> >Self-heal Daemon on gluster05 N/A N/A Y > >> >2785 > >> >Self-heal Daemon on gluster08 N/A N/A Y > >> >2750 > >> >Self-heal Daemon on gluster04 N/A N/A Y > >> >2801 > >> >Self-heal Daemon on gluster09 N/A N/A Y > >> >2793 > >> >Self-heal Daemon on gluster11 N/A N/A Y > >> >2047 > >> >Self-heal Daemon on gluster10 N/A N/A Y > >> >2812 > >> >Self-heal Daemon on gluster12 N/A N/A Y > >> >542 > >> > > >> >Task Status of Volume disp1 > >> > >> > > >>------------------------------------------------------------------------------ > >> >There are no active volume tasks > >> > > >> >On Sun, Mar 1, 2020 at 2:01 AM Strahil Nikolov > ><hunter86...@yahoo.com> > >> >wrote: > >> > > >> >> On March 1, 2020 6:08:31 AM GMT+02:00, Fox <foxxz....@gmail.com> > >> >wrote: > >> >> >I am using a dozen odriod HC2 ARM systems each with a single > >> >HD/brick. > >> >> >Running ubuntu 18 and glusterfs 7.2 installed from the gluster > >PPA. > >> >> > > >> >> >I can create a dispersed volume and use it. If one of the cluster > >> >> >members > >> >> >duck out, say gluster12 reboots, when it comes back online it > >shows > >> >> >connected in the peer list but using > >> >> >gluster volume heal <volname> info summary > >> >> > > >> >> >It shows up as > >> >> >Brick gluster12:/exports/sda/brick1/disp1 > >> >> >Status: Transport endpoint is not connected > >> >> >Total Number of entries: - > >> >> >Number of entries in heal pending: - > >> >> >Number of entries in split-brain: - > >> >> >Number of entries possibly healing: - > >> >> > > >> >> >Trying to force a full heal doesn't fix it. The cluster member > >> >> >otherwise > >> >> >works and heals for other non-disperse volumes even while showing > >up > >> >as > >> >> >disconnected for the dispersed volume. > >> >> > > >> >> >I have attached a terminal log of the volume creation and > >diagnostic > >> >> >output. Could this be an ARM specific problem? > >> >> > > >> >> >I tested a similar setup on x86 virtual machines. They were able > >to > >> >> >heal a > >> >> >dispersed volume no problem. One thing I see in the ARM logs I > >don't > >> >> >see in > >> >> >the x86 logs is lots of this.. > >> >> >[2020-03-01 03:54:45.856769] W [MSGID: 122035] > >> >> >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing > >> >> >operation > >> >> >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on > >> >> >'(null)' > >> >> >with gfid 0d3c4cf3-e09c-4b9a-87d3-cdfc4f49b692 > >> >> >[2020-03-01 03:54:45.910203] W [MSGID: 122035] > >> >> >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing > >> >> >operation > >> >> >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on > >> >> >'(null)' > >> >> >with gfid 0d806805-81e4-47ee-a331-1808b34949bf > >> >> >[2020-03-01 03:54:45.932734] I > >[rpc-clnt.c:1963:rpc_clnt_reconfig] > >> >> >0-disp1-client-11: changing port to 49152 (from 0) > >> >> >[2020-03-01 03:54:45.956803] W [MSGID: 122035] > >> >> >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing > >> >> >operation > >> >> >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on > >> >> >'(null)' > >> >> >with gfid d5768bad-7409-40f4-af98-4aef391d7ae4 > >> >> >[2020-03-01 03:54:46.000102] W [MSGID: 122035] > >> >> >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing > >> >> >operation > >> >> >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on > >> >> >'(null)' > >> >> >with gfid 216f5583-e1b4-49cf-bef9-8cd34617beaf > >> >> >[2020-03-01 03:54:46.044184] W [MSGID: 122035] > >> >> >[ec-common.c:668:ec_child_select] 0-disp1-disperse-0: Executing > >> >> >operation > >> >> >with some subvolumes unavailable. (800). FOP : 'LOOKUP' failed on > >> >> >'(null)' > >> >> >with gfid 1b610b49-2d69-4ee6-a440-5d3edd6693d1 > >> >> > >> >> Hi, > >> >> > >> >> Are you sure that the gluster bricks on this node is up and > >running ? > >> >> What is the output of 'gluster volume status' on this system ? > >> >> > >> >> Best Regards, > >> >> Strahil Nikolov > >> >> > >> > >> This seems like the brick is down. > >> Check with 'ps aux | grep glusterfsd | grep disp1' on the 'gluster12' > >. > >> Most probably it is down and you need to verify the brick is > >properly > >> mounted. > >> > >> Best Regards, > >> Strahil Nikolov > >> > > Hi Fox, > > > Submit a bug and provide a link in the mailing list (add the > gluster-devel in CC once you register for that). > Most probably it's a small thing that can be easily fixed. > > Have you tried to: > gluster volume start <VOLNAME> force > > Best Regards, > Strahil Nikolov >
_______________________________________________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel