Dear Hadoop users, I'm in the process of building a new cluster for our lab and I'm trying to run SGE simultaneously with hadoop. Idea is that each node would function as datanode at all times, but depending on situation and a fraction of nodes will run SGE instead of plain. SGE jobs will not have access to HDFS or local filesystem (except for /tmp) and will run out of external NAS, they aren't supposed to be IO bound.
I'm trying to figure out of what's the best way to setup this resource sharing. One way would be to shutdown tasktrackers on reserved nodes and add them to SGE pool. Another way is run tasktrackers as SGE jobs and each tasktracker would shut down after some idle time. Has anyone tried something like this? I'd appreciate any advice. Thanks.
