On 12/23/2014 01:35 PM, Michael Di Domenico wrote:
I've always shied away from gpfs/lustre on /home and favoured netapp's
for one simple reason. snapshots. i can't tell you home many times
people have "accidentally" deleted a file.
We used a NetApp at my last employer for everything (/home, /usr/local,
etc.) and everything used it (desktops, servers, the cluster), and the
snapshot feature was priceless. Many users liked being able to replace a
file that they accidentally deleted themselves. My inclination is
towards GPFS for this reason instead of Lustre, since GPFS supports
snapshotting (and a few other useful features that Lustre doesn't
provide yet).
but yes, the "user education" about running jobs from /home usually
happens at least once a year when someone new starts. we tend to
publicly shame that person and they don't seem to do it anymore
you never want to be "that guy" that slowed the whole system down... :)
On Tue, Dec 23, 2014 at 12:12 PM, Prentice Bisbal
<[email protected]> wrote:
Beowulfers,
I have limited experience managing parallel filesytems like GPFS or Lustre.
I was discussing putting /home and /usr/local for my cluster on a GPFS or
Lustre filesystem, in addition to using it just for /scratch. I've never
done this before, but it doesn't seem like all that bad an idea. My logic
for this is the following:
1. Users often try to run programs from in /home, which leads to errors, no
matter how many times I tell them not to do that. This would make the system
more user-friendly. I could use quotas/policies to encourage them to use
'steer' them to use other filesystems if needed.
2. Having one storage system to manage is much better than 3.
3. Profit?
Anyway, another person in the conversation felt that this would be bad,
because if someone was running a job that would hammer the fileystem, it
would make the filesystem unresponsive, and keep other people from logging
in and doing work. I'm not buying this concern for the following reasons:
If a job can hammer your parallel filesystem so that the login nodes become
unresponsive, you've got bigger problems, because that means other jobs
can't run on the cluster, and the job hitting the filesystem hard has
probably slowed down to a crawl, too.
I know there are some concerns with the stability of parallel filesystems,
so if someone wants to comment on the dangers of that, too, I'm all ears. I
think that the relative instability of parallel filesystems compared to NFS
would be the biggest concern, not performance.
--
Prentice Bisbal
Manager of Information Technology
Rutgers Discovery Informatics Institute (RDI2)
Rutgers University
http://rdi2.rutgers.edu
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf