Interesting. We just deployed an ESS here and are running into a very similar problem with the gui refresh it appears. Takes my ppc64le's about 45 seconds to run rinv when they are idle. I had just opened a support case on this last evening. We're on ESS 5.3.4 as well. I will wait to see what support says.
Ed Wahl Ohio Supercomputer Center -----Original Message----- From: [email protected] <[email protected]> On Behalf Of Ulrich Sibiller Sent: Thursday, January 30, 2020 9:44 AM To: [email protected] Subject: Re: [gpfsug-discuss] gui_refresh_task_failed for HW_INVENTORY with two active GUI nodes On 1/29/20 2:05 PM, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Can I change the times at which the GUI runs HW_INVENTORY and related tasks? > > we frequently get messages like > > gui_refresh_task_failed GUI WARNING 12 hours ago > The following GUI refresh task(s) failed: HW_INVENTORY > > The tasks fail due to timeouts. Running the task manually most times > succeeds. We do run two gui nodes per cluster and I noted that both > servers seem run the HW_INVENTORY at the exact same time which may > lead to locking or congestion issues, actually the logs show messages > like > > EFSSA0194I Waiting for concurrent operation to complete. > > The gui calls ‘rinv’ on the xCat servers. Rinv for a single > little-endian server takes a long time – about 2-3 minutes , while it > finishes in about 15s for big-endian server. > > Hence the long runtime of rinv on little-endian systems may be an > issue, too > > We run 5.0.4-1 efix9 on the gui and ESS 5.3.4.1 on the GNR systems > (5.0.3.2 efix4). We run a mix of ppc64 and ppc64le systems, which a separate > xCat/ems server for each type. The GUI nodes are ppc64le. > > We did see this issue with several gpfs version on the gui and with at least > two ESS/xCat versions. > > Just to be sure I did purge the Posgresql tables. > > I did try > > /usr/lpp/mmfs/gui/cli/lstasklog HW_INVENTORY > > /usr/lpp/mmfs/gui/cli/runtask HW_INVENTORY –debug > > And also tried to read the logs in /var/log/cnlog/mgtsrv/ - but they are > difficult. I have seen the same on ppc64le. From time to time it recovers but then it starts again. The timeouts are okay, it is the hardware. I haven opened a call at IBM and they suggested upgrading to ESS 5.3.5 because of the new firmwares which I am currently doing. I can dig out more details if you want. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss__;!!KGKeukY!gqw1FGbrK5S4LZwnuFxwJtT6l9bm5S5mMjul3tadYbXRwk0eq6nesPhvndYl$ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
