Re: [gpfsug-discuss] preventing HSM tape recall storms (Bill Pappas)

Bill Pappas Tue, 10 Jul 2018 08:23:06 -0700

Years back I did run a trial (to buy) software solution on OSX to address this 
issue. It worked! It was not cheap and they probably no longer support it 
anyway.  It might have been from a company called Group Logic.

I would suggest not exposing HSM enabled file systems (in particular ones using 
tape on the back end) to your general CIFS (or even) GPFS/NFS clients.  It 
produced years (2011-2015 of frustration with recall storms that made everyone 
mad.  If someone else had success, I think we'd all like to know how they did 
it....but we gave up on that.  In the end I would suggest setting up an 
explicit archive location using/HSM tape (or low cost, high densisty disk) that 
is not pointing to your traditional GPFS/CIFS/NFS clients that users must 
deliberately access (think portal) to check in/out cold data that they can 
stage to their primary workspace.  It is possible you considered this idea or 
some variation of it anyway and rejected it for good reason (e.g. more pain for 
the users to stage data over from cold storage to primary workspacec).

Bill Pappas

901-619-0585

[email protected]

[1466780990050_DSTlogo.png]

________________________________
From: [email protected] 
<[email protected]> on behalf of 
[email protected] 
<[email protected]>
Sent: Tuesday, July 10, 2018 9:50 AM
To: [email protected]
Subject: gpfsug-discuss Digest, Vol 78, Issue 32

Send gpfsug-discuss mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."

Today's Topics:

   1. Re: preventing HSM tape recall storms (Jonathan Buzzard)
   2. Re: What NSDs does a file have blocks on? (Marc A Kaplan)
   3. Re: High I/O wait times (Jonathan Buzzard)
   4. Re: Allocation map limits - any way around this? (Uwe Falke)
   5. Same file opened by many nodes / processes (Peter Childs)

----------------------------------------------------------------------

Message: 1
Date: Tue, 10 Jul 2018 14:00:48 +0100
From: Jonathan Buzzard <[email protected]>
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] preventing HSM tape recall storms
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"

On Mon, 2018-07-09 at 14:57 -0400, Frederick Stock wrote:
> Another option is to request Apple to support the OFFLINE flag in the
> SMB protocol. ?The more Mac customers making such a request (I have
> asked others to do likewise) might convince Apple to add this
> checking to their SMB client.
>

And we have a winner. The only workable solution is to get Apple to
Finder to support the OFFLINE flag. However good luck getting Apple to
actually do anything.

An alternative approach might be to somehow detect the client
connecting is running MacOS and prohibit recalls for them. However I am
not sure the Samba team would be keen on accepting such patches unless
it could be done in say VFS module.

JAB.

--
Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

------------------------------

Message: 2
Date: Tue, 10 Jul 2018 09:08:45 -0400
From: "Marc A Kaplan" <[email protected]>
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on?
Message-ID:

<of5287542c.d657be67-on852582c6.00479067-852582c6.00483...@notes.na.collabserv.com>

Content-Type: text/plain; charset="utf-8"

As long as we're giving hints...
Seems tsdbfs has several subcommands that might be helpful.
I like "inode"
But there's also "listda"
Subcommand "desc" will show you the structure of the file system under
"disks:" you will see which disk numbers are which NSDs.

Have fun, but DO NOT use the any of the *patch* subcommands!

From:   Simon Thompson <[email protected]>
To:     gpfsug main discussion list <[email protected]>
Date:   07/09/2018 05:21 PM
Subject:        Re: [gpfsug-discuss] What NSDs does a file have blocks on?
Sent by:        [email protected]

I was going to say something like that ? e.g.

blockaddr 563148261
Inode 563148261 snap 0 offset 0 N=2  1:45255923200  13:59403784320
1: and 13: in the output are the NSD disk devices for inode 563148261

Simon

From: <[email protected]> on behalf of
"[email protected]" <[email protected]>
Reply-To: "[email protected]"
<[email protected]>
Date: Monday, 9 July 2018 at 22:04
To: "[email protected]" <[email protected]>
Subject: Re: [gpfsug-discuss] What NSDs does a file have blocks on?

(psss... )  tsdbfs

Not responsible for anything bad that happens...!

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180710/0ebdef35/attachment-0001.html>

------------------------------

Message: 3
Date: Tue, 10 Jul 2018 14:12:02 +0100
From: Jonathan Buzzard <[email protected]>
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] High I/O wait times
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"

On Mon, 2018-07-09 at 21:44 +0000, Buterbaugh, Kevin L wrote:

[SNIP]

> Interestingly enough, one user showed up waaaayyyyyy more often than
> anybody else. ?And many times she was on a node with only one other
> user who we know doesn?t access the GPFS filesystem and other times
> she was the only user on the node. ?
>

I have seen on our old HPC system which had been running fine for three
years a particular user with a particular piece of software with
presumably a particular access pattern trigger a firmware bug in a SAS
drive (local disk to the node) that caused it to go offline (dead to
the world and power/presence LED off) and only a power cycle of the
node would bring it back.

At first we through the drives where failing, because what the hell,
but in the end a firmware update to the drives and they where fine.

The moral of the story is don't rule out wacky access patterns from a
single user causing problems.

JAB.

--
Jonathan A. Buzzard?????????????????????????Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

------------------------------

Message: 4
Date: Tue, 10 Jul 2018 16:28:57 +0200
From: "Uwe Falke" <[email protected]>
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] Allocation map limits - any way around
        this?
Message-ID:

<of9fba4dcc.1c355d67-onc12582c6.004f6441-c12582c6.004f8...@notes.na.collabserv.com>

Content-Type: text/plain; charset="ISO-8859-1"

Hi Bob,
you sure the first added NSD was 1 TB? As often as i created a FS, the max
NSD size was way larger than the one I added initially , not just the
fourfold.

Mit freundlichen Gr??en / Kind regards

Dr. Uwe Falke

IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: [email protected]
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung:
Thomas Wolter, Sven Schoo?
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122

From:   "Oesterlin, Robert" <[email protected]>
To:     gpfsug main discussion list <[email protected]>
Date:   10/07/2018 13:59
Subject:        [gpfsug-discuss] Allocation map limits - any way around
this?
Sent by:        [email protected]

File system was originally created with 1TB NSDs (4) and I want to move it
to one 5TB NSD. Any way around this error?

mmadddisk fs1 -F new.nsd

The following disks of proserv will be formatted on node srv-gpfs06:
    stor1v5tb85: size 5242880 MB
Extending Allocation Map
Disk stor1v5tb85 cannot be added to storage pool Plevel1.
Allocation map cannot accommodate disks larger than 4194555 MB.
Checking Allocation Map for storage pool Plevel1
mmadddisk: tsadddisk failed.
Verifying file system configuration information ...
mmadddisk: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
mmadddisk: Command failed. Examine previous error messages to determine
cause.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

------------------------------

Message: 5
Date: Tue, 10 Jul 2018 14:50:54 +0000
From: Peter Childs <[email protected]>
To: "[email protected]"
        <[email protected]>
Subject: [gpfsug-discuss] Same file opened by many nodes / processes
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset="utf-8"

We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

End of gpfsug-discuss Digest, Vol 78, Issue 32
**********************************************

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] preventing HSM tape recall storms (Bill Pappas)

Reply via email to