Re: [gpfsug-discuss] Protection against silent data corruption

Achim Rehor Thu, 02 Jun 2022 10:05:31 -0700

hi Stephan,

there is, see mmchconfig man page :


nsdCksumTraditional
This attribute enables checksum data-integrity checking between a traditional 
NSD client node and its NSD server. Valid values are yes and no. The default 
value is no.
(Traditional in this context means that the NSD client and server are 
configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID.
The latter is a component of IBM Elastic Storage Server (ESS) and of IBM GPFS 
Storage Server (GSS).)

The checksum procedure detects any corruption by the network of the data in the 
NSD RPCs that are exchanged between the NSD client and the
server. A checksum error triggers a request to retransmit the message.

When this attribute is enabled on a client node, the client indicates in each 
of its requests to the server that it is using checksums. The server uses 
checksums only in
response to client requests in which the indicator is set. A client node that 
accesses a file system that belongs to another cluster can use checksums in the 
same way.

You can change the value of the this attribute for an entire cluster without 
shutting down the mmfsd daemon, or for one or more nodes without restarting the 
nodes.

Note:
* Enabling this feature can result in significant I/O performance degradation 
and a considerable increase in CPU usage.

* To enable checksums for a subset of the nodes in a cluster, issue a command 
like the following one:
   mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>

   The -N flag is valid for this attribute.


--

Mit freundlichen Grüßen / Kind regards

Achim Rehor

Technical Support Specialist Spectrum Scale and ESS (SME)
Advisory Product Services Professional
IBM Systems Storage Support - EMEA

[email protected]<mailto:[email protected]> +49-170-4521194
IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940


-----Original Message-----
From: Stephan Graf 
<[email protected]<mailto:stephan%20graf%20%[email protected]%3e>>
Reply-To: gpfsug main discussion list 
<[email protected]<mailto:gpfsug%20main%20discussion%20list%20%[email protected]%3e>>
To: gpfsug-discuss 
<[email protected]<mailto:gpfsug-discuss%20%[email protected]%3e>>
Subject: [EXTERNAL] [gpfsug-discuss] Protection against silent data corruption
Date: Thu, 02 Jun 2022 16:31:43 +0200

Hi,

I am wondering if there is an option in SS to enable some checking to
detect silent data corruption.

Form GNR I know that there is End-to-End integrity. So a checksum is
stored in addition.

The background is that we are facing an issue where in some files (which
have data replication =  2) the mmrestripefile is reporting, that one
block is mismatching it's copy (the storage cluster is running SS
without GNR).
We have validated that the copied block is fine, but the original one is
broken (and this is what is returned on read access).
SS right now in our installation is unable to determine which is the
correct one.
Is there any option to enable this kind of feature in SS? If not, does
it make sense to create an "IDEA" for it?

Stephan

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Re: [gpfsug-discuss] Protection against silent data corruption

Reply via email to