On 12/24/2014 10:58 AM, Joe Landman wrote:
On 12/24/2014 10:54 AM, Prentice Bisbal wrote:
Everyone,
Thanks for the feedback you've provided to my query below. I'm glad
I'm not the only one who thought of this, and a lot of you raised
very good points I haven't thought about. While I've been following
parallel filesystems for years, I have very little experience
actually managing them up to this point. (My BG/P came with GPFS
filesystem for /scratch, but everything was already setup before I
got here, so I've only had to deal with it when something breaks).
You've all convinced me that this may not be an ideal solution
arrangement, but if I go this route, GPFS might be a better fit for
this than Lustre (mainly because Chris Samuels has proven it *is*
possible with GPFS, and GPFS has snapshotting).
Joe Landman, as always, has provided a wealth of information, and the
rest of you have pointed out other potential pitfalls. with this
approach.
My pleasure ... I do think asking James Cuff, Chris Dwan, and others
running/managing big kit (and the teams running the kit), what they
are doing and why would be quite instructive in a bigger picture sense.
Which to a degree suggests that mebbe a devops/best practices BoF or
talk series, or educational workshop at SC15 wouldn't be a bad thing
... I'd be happy to submit a proposal for this for this year.
Let me know ...
Actually, several other System Admins and I are trying to get more
emphasis on System Administration at the SC conferences, and to even
have a SysAdmin track. Talking about practical issues about managing
filesystems, like those brought up here, would be a great topic to
include in this.
Thanks again for the feedback, and please keep the conversation going.
Prentice
On 12/23/2014 12:12 PM, Prentice Bisbal wrote:
Beowulfers,
I have limited experience managing parallel filesytems like GPFS or
Lustre. I was discussing putting /home and /usr/local for my cluster
on a GPFS or Lustre filesystem, in addition to using it just for
/scratch. I've never done this before, but it doesn't seem like all
that bad an idea. My logic for this is the following:
1. Users often try to run programs from in /home, which leads to
errors, no matter how many times I tell them not to do that. This
would make the system more user-friendly. I could use
quotas/policies to encourage them to use 'steer' them to use other
filesystems if needed.
2. Having one storage system to manage is much better than 3.
3. Profit?
Anyway, another person in the conversation felt that this would be
bad, because if someone was running a job that would hammer the
fileystem, it would make the filesystem unresponsive, and keep other
people from logging in and doing work. I'm not buying this concern
for the following reasons:
If a job can hammer your parallel filesystem so that the login nodes
become unresponsive, you've got bigger problems, because that means
other jobs can't run on the cluster, and the job hitting the
filesystem hard has probably slowed down to a crawl, too.
I know there are some concerns with the stability of parallel
filesystems, so if someone wants to comment on the dangers of that,
too, I'm all ears. I think that the relative instability of parallel
filesystems compared to NFS would be the biggest concern, not
performance.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf