On Tue, 2017-07-25 at 09:30 +0000, John Hearns wrote: > I agree with Jonathan. > > In my experience, if you look at why there are many small files being > stored by researchers, these are either the results of data acquisition > - high speed cameras, microscopes, or in my experience a wind tunnel. > Or the images are a sequence of images produced by a simulation which > are later post-processed into a movie or Ensight/Paraview format. When > questioned, the resaechers will always say "but I would like to keep > this data available just in case". In reality those files are never > looked at again. And as has been said if you have a tape based > archiving system you could end up with thousands of small files being > spread all over your tapes. So it is legitimate to make zips / tars of > directories like that. >
Note that rules on data retention may require them to keep them for 10 years, so it is not unreasonable. Letting them spew thousands of files into an "archive" is not sensible. I was thinking of ways of getting the users to do it, and I guess leaving them with zero available file number quota in the new system would force them to zip up their data so they could add new stuff ;-) Archives in my view should have no quota on the space, only quota's on the number of files. Of course that might not be very popular. On reflection I think I would use a policy to restrict to files ending with .zip/.ZIP only. It's an archive and this format is effectively open source, widely understood and cross platform, and with the ZIP64 version will now stand the test of time too. Given it's an archive I would have a script that ran around setting all the files to immutable 7 days after creation too. Or maybe change the ownership and set a readonly ACL to the original user. Need to stop them changing stuff after the event if you are going to use to as part of your anti research fraud measures. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
