10G Ethernet. Thanks, Lohit
On May 22, 2018, 11:55 AM -0400, [email protected], wrote: > Hi Lohit, > > What type of network are you using on the back end to transfer the GPFS > traffic? > > Best, > Dwayne > > From: [email protected] > [mailto:[email protected]] On Behalf Of > [email protected] > Sent: Tuesday, May 22, 2018 1:13 PM > To: gpfsug main discussion list <[email protected]> > Subject: [gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading > from GPFS 5.0.0-2 to GPFS 4.2.3.2 > > Hello All, > > We have recently upgraded from GPFS 4.2.3.2 to GPFS 5.0.0-2 about a month > ago. We have not yet converted the 4.2.2.2 filesystem version to 5. ( That is > we have not run the mmchconfig release=LATEST command) > Right after the upgrade, we are seeing many “ps hangs" across the cluster. > All the “ps hangs” happen when jobs run related to a Java process or many > Java threads (example: GATK ) > The hangs are pretty random, and have no particular pattern except that we > know that it is related to just Java or some jobs reading from directories > with about 600000 files. > > I have raised an IBM critical service request about a month ago related to > this - PMR: 24090,L6Q,000. > However, According to the ticket - they seemed to feel that it might not be > related to GPFS. > Although, we are sure that these hangs started to appear only after we > upgraded GPFS to GPFS 5.0.0.2 from 4.2.3.2. > > One of the other reasons we are not able to prove that it is GPFS is because, > we are unable to capture any logs/traces from GPFS once the hang happens. > Even GPFS trace commands hang, once “ps hangs” and thus it is getting > difficult to get any dumps from GPFS. > > Also - According to the IBM ticket, they seemed to have a seen a “ps hang" > issue and we have to run mmchconfig release=LATEST command, and that will > resolve the issue. > However we are not comfortable making the permanent change to Filesystem > version 5. and since we don’t see any near solution to these hangs - we are > thinking of downgrading to GPFS 4.2.3.2 or the previous state that we know > the cluster was stable. > > Can downgrading GPFS take us back to exactly the previous GPFS config state? > With respect to downgrading from 5 to 4.2.3.2 -> is it just that i reinstall > all rpms to a previous version? or is there anything else that i need to make > sure with respect to GPFS configuration? > Because i think that GPFS 5.0 might have updated internal default GPFS > configuration parameters , and i am not sure if downgrading GPFS will change > them back to what they were in GPFS 4.2.3.2 > > Our previous state: > > 2 Storage clusters - 4.2.3.2 > 1 Compute cluster - 4.2.3.2 ( remote mounts the above 2 storage clusters ) > > Our current state: > > 2 Storage clusters - 5.0.0.2 ( filesystem version - 4.2.2.2) > 1 Compute cluster - 5.0.0.2 > > Do i need to downgrade all the clusters to go to the previous state ? or is > it ok if we just downgrade the compute cluster to previous version? > > Any advice on the best steps forward, would greatly help. > > Thanks, > > Lohit > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
