The rationale for my suggestion doesn't have much to do with the central DNS 
server, but everything to do with the DNS client side of the service.
If you have a very busy cluster at times, and a number of nodes really busy 
with 1000+ IOPs for instance, so much that the OS on the client can't barely 
spare a cycle to query the DSN server on what the IP associated with the name 
of interface leading to the GPFS infrastructure is, or even process that 
response when it returns, on the same interface where it's having contentions 
and trying to process all the gpfs data transactions, you can have temporary 
catch 22 situations. This can generate a backlog of waiters, and eventual 
expelling of some nodes when the cluster managers don't hear from them in 
reasonable time.

It's doesn't really matter if you have a central DNS server in steroids.

Jaime

On 5/10/2020 03:35:29, TURNER Aaron wrote:
Following on from Jonathan Buzzards comments, I'd also like to point out that 
I've never known a central DNS failure in a UK HEI for as long as I can 
remember, and it was certainly not my intention to suggest that as I think a 
central DNS issue is highly unlikely. And indeed, as I originally noted, the 
standard command-line tools on the nodes resolve the names as expected, so 
whatever is going on looks like it affects GPFS only. It may even be that the 
repetition of the domain names in the logs is just a function of something it 
is doing when logging when a node is failing to connect for some other reason 
entirely. It's just not something I recall having seen before and wanted to see 
if anyone else had seen it.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*From:* gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Jonathan Buzzard 
<jonathan.buzz...@strath.ac.uk>
*Sent:* 09 May 2020 23:22
*To:* gpfsug-discuss@spectrumscale.org <gpfsug-discuss@spectrumscale.org>
*Subject:* Re: [gpfsug-discuss] Odd networking/name resolution issue
On 09/05/2020 12:06, Jaime Pinto wrote:
DNS shouldn't be relied upon on a GPFS cluster for internal communication/management or data.


The 1980's have called and want their lack of IP resolution protocols
back :-)

I would kindly disagree. If your DNS is not working then your cluster is
fubar anyway and a zillion other things will also break very rapidly.
For us at least half of the running jobs would be dead in a few minutes
as failure to contact license servers would cause the software to stop.
All authentication and account lookup is also going to fail as well.

You could distribute a hosts file but frankly outside of a storage only
cluster (as opposed to one with hundreds if not thousands of compute
nodes) that is frankly madness and will inevitably come to bite you in
the ass because they *will* get out of sync. The only hosts entry we
have is for the Salt Stack host because it tries to do things before the
DNS resolvers have been setup and consequently breaks otherwise. Which
IMHO is duff on it's behalf.

I would add I can't think of a time in the last 16 years where internal
DNS at any University I have worked at has stopped working for even one
millisecond. If DNS is that flaky at your institution then I suggest
sacking the people responsible for it's maintenance as being incompetent
twits. It is just such a vanishingly remote possibility that it's not
worth bothering about. Frankly a aircraft falling out the sky and
squishing your data centre seems more likely to me.

Finally in a world of IPv6 then anything other than DNS is a utter
madness IMHO.


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to