Hi Kristy, Yes thanks for picking this up.
So we (UoB) have 3 GPFS environments, each with different approaches. 1. OpenStack (GPFS as infrastructure) - we don't back this up at all. Partly this is because we are still in pilot phase, and partly because we also have ~7PB CEPH over 4 sites for this project, and the longer term aim is for us to ensure data sets and important VM images are copied into the CEPH store (and then replicated to at least 1 other site). We have some challenges with this, how should we do this? We're sorta thinging about maybe going down the irods route for this, policy scan the FS maybe, add xattr onto important data, and use that to get irods to send copies into CEPH (somehow). So this would be a bit of a hybrid home-grown solution going on here. Anyone got suggestions about how to approach this? I know IBM are now an irods consortium member, so any magic coming from IBM to integrate GFPS and irods? 2. HPC. We differentiate on our HPC file-system between backed up and non backed up space. Mostly its non backed up, where we encourage users to keep scratch data sets. We provide a small(ish) home directory which is backed up with TSM to tape, and also backup applications and system configs of the system. We use a bunch of jobs to sync some configs into local git which also is stored in the backed up part of the FS, so things like switch configs, icinga config can be backed up sanely. 3. Research Data Storage. This is a large bulk data storage solution. So far its not actually that large (few hundred TB), so we take the traditional TSM back to tape approach (its also sync replicated between data centres). We're already starting to see some possible slowness on this with data ingest and we've only just launched the service. Maybe that is a cause of launching that we suddenly see high data ingest. We are also experimenting with HSM to tape, but other than that we have no other ILM policies - only two tiers of disk, SAS for metadata and NL-SAS for bulk data. I'd like to see a flash tier in there for Metadata, which would free SAS drives and so we might be more into ILM policies. We have some more testing with snapshots to do, and have some questions about recovery of HSM files if the FS is snapshotted. Anyone any experience with this with 4.1 upwards versions of GPFS? Straight TSM backup for us means we can end up with 6 copies of data - once per data centre, backup + offsite backup tape set, HSM pool + offsite copy of HSM pool. (If an HSM tape fails, how do we know what to restore from backup? Hence we make copies of the HSM tapes as well). As our backups run on TSM, it uses the policy engine and mmbackup, so we only backup changes and new files, and never backup twice from the FS. Does anyone know how TSM backups handle XATTRs? This is one of the questions that was raised at meet the devs. Or even other attributes like immutability, as unless you are in complaint mode, its possible for immutable files to be deleted in some cases. In fact this is an interesting topic, it just occurred to me, what happens if your HSM tape fails and it contained immutable files. Would it be possible to recover these files if you don't have a copy of the HSM tape? - can you do a synthetic recreate of the TSM HSM tape from backups? We typically tell users that backups are for DR purposes, but that we'll make efforts to try and restore files subject to resource availability. Is anyone using SOBAR? What is your rationale for this? I can see that at scale, there are lot of benefits to this. But how do you handle users corrupting/deleting files etc? My understanding of SOBAR is that it doesn't give you the same ability to recover versions of files, deletions etc that straight TSM backup does. (this is something I've been meaning to raise for a while here). So what do others do? Do you have similar approaches to not backing up some types of data/areas? Do you use TSM or home-grown solutions? Or even other commercial backup solutions? What are your rationales for making decisions on backup approaches? Has anyone built their own DMAPI type interface for doing these sorts of things? Snapshots only? Do you allow users to restore themselves? If you are using ILM, are you doing it with straight policy, or is TSM playing part of the game? (If people want to comment anonymously on this without committing their company on list, happy to take email to the chair@ address and forward on anonymously to the group). Simon On 26/10/2015, 02:38, "[email protected] on behalf of Kallback-Rose, Kristy A" <[email protected] on behalf of [email protected]> wrote: >Simon wrote recently in the GPFS UG Blog: "We also got into discussion on >backup and ILM, and I think its amazing how everyone does these things in >their own slightly different way. I think this might be an interesting >area for discussion over on the group mailing list. There's a lot of >options and different ways to do things!² > >Yes, please! I¹m *very* interested in what others are doing. > >We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS >Integration‹we have had HPSS for a very long time), but I¹m interested >what others are doing with either ILM or other methods to brew their own >backup solutions, how much they are backing up and with what regularity, >what resources it takes, etc. > >If you have anything going on at your site that¹s relevant, can you >please share? > >Thanks, >Kristy > >Kristy Kallback-Rose >Manager, Research Storage >Indiana University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
