mmvdisk rg change —active is a very common operation. It should be perfectly safe.
mmvdisk rg change —restart is an option I didn’t know about, so likely not something that’s commonly used. I wouldn’t be too worried about losing the RGs. I don’t think that’s something that can happen without support being able to help getting it back online. Once I’ve had a situation similar to your RG not wanting to become active again during an upgrade (around 5 years ago), and I believe we solved it by rebooting the io-nodes — must have been some stuck process I was unable to understand… or was it a CCR issue caused by some nodes being way back-level..? Don’t remember. -jf tor. 24. aug. 2023 kl. 20:22 skrev Walter Sklenka < [email protected]>: > Hi Jan-Frode! > > We did the “switch” with mmvdisk rg change –rg ess3500_ess_n1_hs_ess_n2_hs > –active ess-n2-hs “ > > Both nodes were up and we did not see any anomalies. And the rg was > successfully created with the log groups > > Maybe the method to switch the rg (with –active) is a bad idea? (because > manuals says: > > > https://www.ibm.com/docs/en/ess/6.1.6_lts?topic=command-mmvdisk-recoverygroup > > *For a shared recovery group, the mmvdisk recoverygroup change --active * > *Node** command means to make the specified node the server for all four > user log groups and the root log group. The specified node therefore > temporarily becomes the sole active server for the entire shared recovery > group, leaving the other server idle. This should only be done in unusual > maintenance situations, since it is normally considered an error condition > for one of the servers of a shared recovery group to be idle. If the > keyword DEFAULT is used instead of a server name, it restores the normal > default balance of log groups, making each of the two servers responsible > for two user log groups.* > > > > > this was the state before we tried to restart , no log are seen, we got > “unable to reset server list” > > ~]$ sudo mmvdisk server list --rg ess3500_ess_n1_hs_ess_n2_hs > > > > > > node > > number server active remarks > > ------ -------------------------------- ------- ------- > > 98 ess-n1-hs yes configured > > 99 ess-n2-hs yes configured > > > > > > ~]$ sudo mmvdisk recoverygroup list --rg ess3500_ess_n1_hs_ess_n2_hs > > > > > > > needs user > > recovery group node > class active current or master > server service vdisks remarks > > ----------------------------------- ---------- ------- > -------------------------------- ------- ------ ------- > > ess3500_ess_n1_hs_ess_n2_hs ess3500_mmvdisk_ess_n1_hs_ess_n2_hs no > - unknown 0 > > > > > > ~]$ ^C > > ~]$ sudo mmvdisk rg change --rg ess3500_ess_n1_hs_ess_n2_hs --restart > > mmvdisk: > > mmvdisk: > > mmvdisk: Unable to reset server list for recovery group > 'ess3500_ess_n1_hs_ess_n2_hs'. > > mmvdisk: Command failed. Examine previous error messages to determine > cause. > > > > > > Well, in the logs we did not find anything > > And finally we had to delete the rg , because we urgently needed new space > > With the new one we tested again and we did mmshutdown -startup , and > also with --active flag, and all went ok. And now we have data on the rg > > But we are in concern that this might happen sometimes again and we might > not be able to reenable the rg leading to a disaster > > > > So if you have any idea I would appreciate very much 😊 > > > > Best regards > > Walter > > *From:* gpfsug-discuss <[email protected]> *On Behalf Of > *Jan-Frode > Myklebust > *Sent:* Donnerstag, 24. August 2023 14:51 > *To:* gpfsug main discussion list <[email protected]> > *Subject:* Re: [gpfsug-discuss] FW: ESS 3500-C5 : rg has resigned > permanently > > > > It does sound like "mmvdisk rg change --restart" is the "varyon" command > you're looking for.. but it's not clear why it's failing. I would start by > looking at if there are any lower level issues with your cluster. Are your > nodes healthy on a GPFS-level? "mmnetverify -N all" says network is OK ? > "mmhealth node show -N all" not indicating any issues ? Check > mmfs.log.latest ? > > > > On Thu, Aug 24, 2023 at 1:41 PM Walter Sklenka < > [email protected]> wrote: > > > > Hi ! > > Does someone eventually have experience with ESS 3500 ( no hybrid config, > only NLSAS with 5 enclosures ) > > > > We have issues with a shared recoverygroup. After creating it we made a > test of setting only one node active (mybe not an optimal idea) > > But since then the recoverygroup is down > > We have created a PMR but do not get any response until now. > > > > The rg has no vdisks of any filesystem > > [gpfsadmin@hgess02-m ~]$ ^C > [gpfsadmin@hgess02-m ~]$ sudo mmvdisk rg change --rg > ess3500_hgess02_n1_hs_hgess02_n2_hs --restart > mmvdisk: > mmvdisk: > mmvdisk: Unable to reset server list for recovery group > 'ess3500_hgess02_n1_hs_hgess02_n2_hs'. > mmvdisk: Command failed. Examine previous error messages to determine > cause. > > > > We also tried > > 2023-08-21_16:57:26.174+0200: [I] Command: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l root hgess02-n2-hs.invalid > 2023-08-21_16:57:26.201+0200: [I] Recovery group > ess3500_hgess02_n1_hs_hgess02_n2_hs has resigned permanently > 2023-08-21_16:57:26.201+0200: [E] Command: err 2: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l root hgess02-n2-hs.invalid > 2023-08-21_16:57:26.201+0200: Specified entity, such as a disk or file > system, does not exist. > 2023-08-21_16:57:26.207+0200: [I] Command: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l LG001 hgess02-n2-hs.invalid. > 2023-08-21_16:57:26.207+0200: [E] Command: err 212: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l LG001 hgess02-n2-hs.invalid > 2023-08-21_16:57:26.207+0200: The current file system manager failed and > no new manager will be appointed. This may cause nodes mounting the file > system to experience mount failures. > 2023-08-21_16:57:26.213+0200: [I] Command: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l LG002 hgess02-n2-hs.invalid > 2023-08-21_16:57:26.213+0200: [E] Command: err 212: tsrecgroupserver > ess3500_hgess02_n1_hs_hgess02_n2_hs -f -l LG002 hgess02-n2-hs.invalid > 2023-08-21_16:57:26.213+0200: The current file system manager failed and > no new manager will be appointed. This may cause nodes mounting the file > system to experience mount failures. > > > > > > For us it is crucial to know what we can do if theis happens again ( it > has no vdisks yet so it is not critical ). > > > > Do you know: is there a non documented way to “vary on”, or activate a > recoverygroup again? > > The doc : > > > https://www.ibm.com/docs/en/ess/6.1.6_lts?topic=rgi-recovery-group-issues-shared-recovery-groups-in-ess > > tells to mmshutdown and mmstartup, but the RGCM does say nothing > > When trying to execute any vdisk command it only says “rg down”, no idea > how we could recover from that without deleting the rg ( I hope it will > never happen, when we have vdisks on it > > > > > > > > Have a nice day > > Walter > > > > > > > > > > Mit freundlichen Grüßen > *Walter Sklenka* > *Technical Consultant* > > > > EDV-Design Informationstechnologie GmbH > Giefinggasse 6 > <https://www.google.com/maps/search/Giefinggasse+6?entry=gmail&source=g>/1/2, > A-1210 Wien > Tel: +43 1 29 22 165-31 > Fax: +43 1 29 22 165-90 > E-Mail: [email protected] > Internet: www.edv-design.at > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
