Hi Tom,
Sounds like the trick. This node is a slave so it's datanode and
tasktracker are started from the master.
- how do I start the cluster without starting the datanode and the
tasktracker on the mini-node slave? Remove it from slaves?
- what do I minimally need to start on the mini-node?
Also I have replication set to 2 so the data will just get re-replicated
once the mini-node is reconfigured, right? There should be another copy
somewhere on the cluster.
Thanks
Pat
On 6/4/12 2:38 PM, Tom Melendez wrote:
Hi Pat,
Sounds like you would just turn off the datanode and the tasktracker.
Your config will still point to the Namenode and JT, so you can still
launch jobs and read/write from HDFS.
You'll probably want to replicate the data off first of course.
Thanks,
Tom
On Mon, Jun 4, 2012 at 2:06 PM, Pat Ferrel<p...@occamsmachete.com> wrote:
I have a machine that is part of the cluster but I'd like to dedicate it to
being the web server and run the db but still have access to starting jobs
and getting data out of hdfs. In other words I'd like to have the cores,
memory, and disk only minimally affected by running jobs on the cluster yet
still have easy access when I need to get data out.
I assume I can do something like set the max number of jobs for the node to
0 and something similar for hdfs? Is there a recommended way to go about
this?