Andrew D. Fant wrote:
I know that the common wisdom on this subject is "don't do that", but for

Shouldn't be an issue if you have a sane distribution and distribution load system, a way to automatically handle the ABI (bit width) during installation/package selection. Distros which do this (mostly) correctly include FCx, SuSE, Centos, ...

various reasons, I have to look at the possibility of putting a 64-bit system
(probably EMT as opposed to Opteron) as the user node of our cluster, I have a
separate management node that handles the batch scheduler, license management
and compute node imaging, and related duties, which would remain a 32-bit Xeon,
so that isn't going to directly factor into the decision.  This is motivated by
a desire to allow users to run interactive jobs on the user node instead of
playing games with wrapper scripts to run them on compute nodes.  My personal
preference would be to have a separate system that can remotely submit to the
existing cluster via the batch queues, but there is a desire by management to
limit the number of different systems that a user needs to know about logging
into.  The 64-bit motivation is mostly about providing adequate memory for
multiple users running gui applications.

Hmmm... so you want to provide a single 64 bit machine to run GUI code on rather than hacking stuff for the cluster? Assuming I understood this right, apart from contention for that resource, this should be fine. Is there any reason why the SGE/PBS methods (qrsh/qsub -I) wouldn't work? Or is this the pain of which you speak?

Has anyone had any success with this approach, or failing that, any horror
stories that would support the more flexible approach of separating the shell
server from the head node?

I think this is actually a good practice. You really don't want users logging onto a management node to run jobs. You would likely prefer them to run on some sort of user-login-node. Lots of cluster distros do fuse these two. This is assuming a non-SSI machine (e.g. not Scyld/bproc/Clustermatic/...).

The only major issue is that if they then submit a job with a binary which happens to be the wrong ABI, you will get lots of dud runs and unhappy users. You can fix that with some clever defaults on the submission side for each user-login-node.


Thanks,
        Andy



--

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to