Quoting Konstantin Arnold <[email protected]>:

Hi Jaime,

... maybe I can give some comments with experience from the field:
I would suggest, after reaching a high-watermark threshold, the recall
speed could be throttled to a rate that is lower than migration speed
(but still high enough to not run into a timeout). I don't think it's a
good idea to send access denied while trying to prioritize migration. If
non-IT people would see this message they could think the system is
broken. It would be unclear what a batch job would do that has to
prepare data, in the worst case processing would start with incomplete data.

I wouldn't object to any strategy that lets us empty the vase quicker than it's being filled. It may just make the solution more complex for developers, since this feels a lot like a mini-scheduler.

On the other hand I don't see much of an issue for non-IT people or batch jobs depending on the data to be recalled: we already enable quotas on our file systems. When quotas are reached the system is supposed to "break" anyway, for that particular user|group or application, and they still have to handle this situation properly.



We are currently recalling all out data on tape to be moved to a
different system. There is 15x more data on tape than what would fit on
the disk pool (and there are millions of files before we set inode quota
to a low number). We are moving user/project after an other by using
tape ordered recalls. For that we had to disable a policy that was
aggressively pre-migrating files and allowed to quickly free space on
the disk pool. I must admit that it took us a while of tuning thresholds
and policies.

That is certainly an approach to consider. We still think the application should be able to properly manage occupancy on the same file system. We run a different system which has a disk based cache layer as well, and the strategy is to keep it as full as possible (85-90%), so to avoid retrieving data from tape whenever possible, while still leaving some cushion for newly saved data. Indeed finding the sweet spot is a balancing act.

Thanks for the feedback
Jaime



Best
Konstantin



On 03/09/2016 01:12 PM, Jaime Pinto wrote:
Yes! A behavior along those lines would be desirable. Users understand
very well what it means for a file system to be near full.

Are there any customers already doing something similar?

Thanks
Jaime

Quoting Dominic Mueller-Wicke01 <[email protected]>:


Hi Jamie,

I see. So, the recall-shutdown would be something for a short time
period.
right? Just for the time it takes to migrate files out and free space. If
HSM would allow the recall-shutdown, the impact for the users would be
that
each access to migrated files would lead to an access denied error. Would
that be acceptable for the users?

Greetings, Dominic.

______________________________________________________________________________________________________________


Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
Lead |
+49 7034 64 32794 | [email protected]

Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart,
HRB 243294



From:    Jaime Pinto <[email protected]>
To:    Dominic Mueller-Wicke01/Germany/IBM@IBMDE
Cc:    [email protected]
Date:    08.03.2016 21:38
Subject:    Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
            priority



Thanks for the suggestions Dominic

I remember playing around with premigrated files at the time, and that
was not satisfactory.

What we are looking for is a configuration based parameter what will
basically break out of the "transparency for the user" mode, and not
perform any further recalling, period, if|when the file system
occupancy is above a certain threshold (98%). We would not mind if
instead gpfs would issue a preemptive "disk full" error message to any
user/app/job relying on those files to be recalled, so migration on
demand will have a chance to be performance. What we prefer is to swap
precedence, ie, any migration requests would be executed ahead of any
recalls, at least until a certain amount of free space on the file
system has been cleared.

It's really important that this type of feature is present, for us to
reconsider the TSM version of HSM as a solution. It's not clear from
the manual that this can be accomplish in some fashion.

Thanks
Jaime

Quoting Dominic Mueller-Wicke01 <[email protected]>:



Hi,

in all cases a recall request will be handled transparent for the
user at
the time a migrated files is accessed. This can't be prevented and has
two
down sides: a) the space used in the file system increases and b) random
access to storage media in the Spectrum Protect server happens. With
newer
versions of Spectrum Protect for Space Management a so called tape
optimized recall method is available that can reduce the impact to the
system (especially Spectrum Protect server).
If the problem was that the file system went out of space at the time
the
recalls came in I would recommend to reduce the threshold settings for
the
file system and increase the number of premigrated files. This will
allow
to free space very quickly if needed. If you didn't use the policy based
threshold migration so far I recommend to use it. This method is
significant faster compared to the classical HSM based threshold
migration
approach.

Greetings, Dominic.


______________________________________________________________________________________________________________



Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
Lead
|
+49 7034 64 32794 | [email protected]

Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht
Stuttgart,
HRB 243294
----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016
18:21
-----

From:         Jaime Pinto <[email protected]>
To:         gpfsug main discussion list
<[email protected]>
Date:         08.03.2016 17:36
Subject:         [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
priority
Sent by:         [email protected]



I'm wondering whether the new version of the "Spectrum Suite" will
allow us set the priority of the HSM migration to be higher than
staging.


I ask this because back in 2011 when we were still using Tivoli HSM
with GPFS, during mixed requests for migration and staging operations,
we had a very annoying behavior in which the staging would always take
precedence over migration. The end-result was that the GPFS would fill
up to 100% and induce a deadlock on the cluster, unless we identified
all the user driven stage requests in time, and killed them all. We
contacted IBM support a few times asking for a way fix this, and were
told it was built into TSM. Back then we gave up IBM's HSM primarily
for this reason, although performance was also a consideration (more
to this on another post).

We are now reconsidering HSM for a new deployment, however only if
this issue has been resolved (among a few others).

What has been some of the experience out there?

Thanks
Jaime




---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of
Toronto.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss










          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of
Toronto.










         ************************************
          TELL US ABOUT YOUR SUCCESS STORIES
         http://www.scinethpc.ca/testimonials
         ************************************
---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of
Toronto.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss







         ************************************
          TELL US ABOUT YOUR SUCCESS STORIES
         http://www.scinethpc.ca/testimonials
         ************************************
---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to