OK, so remove the mini-node (client) from the master's slaves since it's
no longer a node. This will cause the client to not get started when the
master starts. There is no init.d script on the client only the master
since it was always started by the master through ssh and start-all.sh.
The config on the client is still set to point to the master so hadoops
-fs will get data from the cluster. I suppose having init.d scripts on
all the slaves (not the clients) as well as the master is a better way
to handle power outages since they will come up at slightly different times.
I think I get it now. (correct me if I'm wrong)
Thanks,
Pat
On 6/4/12 4:06 PM, Tom Melendez wrote:
Hi Pat,
Sounds like the trick. This node is a slave so it's datanode and tasktracker
are started from the master.
- how do I start the cluster without starting the datanode and the
tasktracker on the mini-node slave? Remove it from slaves?
There's no "main" cluster software, just don't start those services.
If you're on Linux and have init.d scripts, look for the ones that are
appended with datanode and tasktracker.
- what do I minimally need to start on the mini-node?
Nothing except the hadoop jars. The presence of the config files in
your CLASSPATH is all you need to talk to your cluster. So, if you
can run hadoop dfs -ls /some/path/in/hdfs and it succeeds, you're
probably OK.
Also I have replication set to 2 so the data will just get re-replicated
once the mini-node is reconfigured, right? There should be another copy
somewhere on the cluster.
Probably.
It's not really a "mini-node", it's really just a client at this
point, it's not known by your cluster. You could configure your
laptop or any other machine to do the same thing, for example.
Thanks,
Tom