Hi everyone, Well, I went back to Gluster's own repo ( https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/) rather than using OpenSUSE's filesystem one ( https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/), and tried upgrading from 7.9 to 9.1 again. This time, everything worked fine, though the status commands were failing until all the nodes got upgraded due to this:
gluster volume status Locking failed on hive. Please check log file for details. Locking failed on citadel. Please check log file for details. Something about RPC: [2021-07-27 00:47:49.288575] E [rpcsvc.c:194:rpcsvc_get_program_vector_sizer] 0-rpc-service: RPC procedure 7 not available for Program GlusterD svc mgmt v3 [2021-07-27 00:47:49.288608] E [rpcsvc.c:350:rpcsvc_program_actor] 0-rpc-service: RPC Program procedure not available for procedure 7 in GlusterD svc mgmt v3 for 192.168.210.155:49141 During this time, the servers kept syncing, and the fuse fs was available, so there was no downtime. And as I mentioned, after upgrading, everything seems fine. With this in mind, it'd have been much more user-friendly if the version from the filesystem repo without IPv6 options compiled in didn't explode in the way it did. I couldn't even probe for peers after installing. IMO there's still a legitimate issue to be fixed here and a test to add to the codebase. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | @ArtemR <http://twitter.com/ArtemR> On Fri, Jul 23, 2021 at 7:09 PM Strahil Nikolov <hunter86...@yahoo.com> wrote: > Can you try setting "transport.address-family: inet" at > /etc/glusterfs/glusterd.vol on all nodes ? > > About the rpms, if they are not yet built - the only other option is to > build them from source. > > I assume , that the second try is on a fresh set of systems without any > remnants of old Gluster install. > > Best Regards, > Strahil Nikolov > > > > > > > В петък, 23 юли 2021 г., 07:55:01 ч. Гринуич+3, Artem Russakovskii < > archon...@gmail.com> написа: > > > > > > Hi Strahil, > > I am using repo builds from > https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/ > (currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them. > > Perhaps the builds at > https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/ > are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone > know? > > None of the repos currently have 9.3. > > And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is > there a way to make it stop trying to use IPv6 and only use IPv4? > > Sincerely, > Artem > > -- > Founder, Android Police, APK Mirror, Illogical Robot LLC > beerpla.net | @ArtemR > > > On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86...@yahoo.com> > wrote: > > Did you try with latest 9.X ? Based on the release notes that should be > 9.3 . > > > > Best Regards, > > Strahil Nikolov > > > > > >> > >> > >> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii > >> <archon...@gmail.com> wrote: > >> > >> > >> > >> Hi all, > >> > >> I just filed this ticket > https://github.com/gluster/glusterfs/issues/2648, and wanted to bring it > to your attention. Any feedback would be appreciated. > >> > >> Description of problem: > >> We have a 4-node replicate cluster running gluster 7.9. I'm currently > setting up a new cluster on a new set of machines and went straight for > gluster 9.1. > >> However, I was unable to probe any servers due to this error: > >> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487] > [glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received > CLI probe req nexus2 24007 > >> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075] > [common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo > [{ret=Name or service not known}] > >> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408] > [glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] > 0-management: error in getaddrinfo: Name or service not known > >> [Unknown error -2] > >> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128] > [glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find > peerinfo for host: nexus2 (24007) > >> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061] > [glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd: > Failed to get tcp-user-timeout > >> [2021-07-17 00:31:05.375903 +0000] I > [rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting > frame-timeout to 600 > >> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075] > [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo > [{family=10}, {ret=Name or service not known}] > >> [2021-07-17 00:31:05.377043 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host nexus2 > >> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498] > [glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect > returned 0 > >> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004] > [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer > <nexus2> (<00000000-0000-0000-0000-000000000000>), in state <Establishing > Connection>, has disconnected from glusterd. > >> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032] > [store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to > /var/lib/glusterd/glusterd.info. [No such file or directory] > >> > >> I then wiped the /var/lib/glusterd dir to start clean and downgraded to > 7.9, then attempted to peer probe again. This time, it worked fine, proving > 7.9 is working, same as it is on prod. > >> At this point, I made a volume, started it, and played around with > testing to my satisfaction. Then I decided to see what would happen if I > tried to upgrade this working volume from 7.9 to 9.1. > >> The end result is: > >> * gluster volume status is only showing the local gluster node and > not any of the remote nodes > >> * data does seem to replicate, so the connection between the > servers is actually established > >> * logs are now filled with constantly repeating messages like so: > >> [2021-07-22 23:29:31.039004 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host nexus2 > >> [2021-07-22 23:29:31.039212 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host citadel > >> [2021-07-22 23:29:31.039304 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host hive > >> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] > 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not > known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and > [2021-07-22 23:29:31.039302 +0000] > >> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075] > [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo > [{family=10}, {ret=Name or service not known}] > >> [2021-07-22 23:29:34.039441 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host nexus2 > >> [2021-07-22 23:29:34.039558 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host citadel > >> [2021-07-22 23:29:34.039659 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host hive > >> [2021-07-22 23:29:37.039741 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host nexus2 > >> [2021-07-22 23:29:37.039921 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host citadel > >> [2021-07-22 23:29:37.040015 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host hive > >> > >> When I issue a command in cli: > >> ==> cli.log <== > >> [2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli: Started > running gluster with version 9.1 > >> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect] > 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not > supported"** > >> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190] > [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread > with index [{index=0}] > >> > >> **Mandatory info:** **- The output of the `gluster volume info` > command**: > >> gluster volume info > >> > >> Volume Name: ap > >> Type: Replicate > >> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > >> Status: Started > >> Snapshot Count: 0 > >> Number of Bricks: 1 x 4 = 4 > >> Transport-type: tcp > >> Bricks: > >> Brick1: nexus2:/mnt/nexus2_block1/ap > >> Brick2: forge:/mnt/forge_block1/ap > >> Brick3: hive:/mnt/hive_block1/ap > >> Brick4: citadel:/mnt/citadel_block1/ap > >> Options Reconfigured: > >> performance.client-io-threads: on > >> nfs.disable: on > >> storage.fips-mode-rchecksum: on > >> transport.address-family: inet > >> cluster.self-heal-daemon: enable > >> client.event-threads: 4 > >> cluster.data-self-heal-algorithm: full > >> cluster.lookup-optimize: on > >> cluster.quorum-count: 1 > >> cluster.quorum-type: fixed > >> cluster.readdir-optimize: on > >> cluster.heal-timeout: 1800 > >> disperse.eager-lock: on > >> features.cache-invalidation: on > >> features.cache-invalidation-timeout: 600 > >> network.inode-lru-limit: 500000 > >> network.ping-timeout: 7 > >> network.remote-dio: enable > >> performance.cache-invalidation: on > >> performance.cache-size: 1GB > >> performance.io-thread-count: 4 > >> performance.md-cache-timeout: 600 > >> performance.rda-cache-limit: 256MB > >> performance.read-ahead: off > >> performance.readdir-ahead: on > >> performance.stat-prefetch: on > >> performance.write-behind-window-size: 32MB > >> server.event-threads: 4 > >> cluster.background-self-heal-count: 1 > >> performance.cache-refresh-timeout: 10 > >> features.ctime: off > >> cluster.granular-entry-heal: enable > >> > >> - The output of the gluster volume status command: > >> gluster volume status > >> Status of volume: ap > >> Gluster process TCP Port RDMA > Port Online Pid > >> > ------------------------------------------------------------------------------ > >> Brick forge:/mnt/forge_block1/ap 49152 > 0 Y 2622 > >> Self-heal Daemon on localhost N/A > N/A N N/A > >> > >> Task Status of Volume ap > >> > ------------------------------------------------------------------------------ > >> There are no active volume tasks > >> > >> - The output of the gluster volume heal command: > >> gluster volume heal ap enable > >> Enable heal on volume ap has been successful > >> > >> gluster volume heal ap > >> Launching heal operation to perform index self heal on volume ap has > been unsuccessful: > >> Self-heal daemon is not running. Check self-heal daemon log file. > >> > >> - The operating system / glusterfs version: > >> OpenSUSE 15.2, glusterfs 9.1. > >> > >> > >> Sincerely, > >> Artem > >> > >> -- > >> Founder, Android Police, APK Mirror, Illogical Robot LLC > >> beerpla.net | @ArtemR > >> > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://meet.google.com/cpu-eiue-hvk > >> Gluster-users mailing list > >> Gluster-users@gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > >> > >> > >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users