On 30/11/17 18:01, Skylar Thompson wrote:

[SNIP]

To be fair, a lot of our biomedical/informatics folks have no choice in the
matter because the vendors are imposing a filesystem-as-a-database paradigm
on them. Each of our Illumina sequencers, for instance, generates a few
million files per run, many of which are images containing raw data from
the sequencers that are used to justify refunds for defective reagents.
Sure, we could turn them off, but then we're eating $$$ we could be getting
back from the vendor.


Been there too. What worked was having a find script that ran through their files, found directories that had not been accessed for a week and zipped them all up, before nuking the original files.

The other thing I would suggest is if they want to buy sequencers from vendors who are brain dead, then that's fine but they are going to have to pay extra for the storage because they are costing way more than the average to store their files. Far to much buying of kit goes on without any thought of the consequences of how to deal with the data it generates.

Then there where the proteomics bunch who basically just needed a good thrashing with a very large clue stick, because the zillions of files where the result of their own Perl scripts.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to