Alejandro Abdelnur wrote:
A while ago I've opened an issue related to this topic
https://issues.apache.org/jira/browse/HADOOP-3287
My take is a little different, when submitting a job, the clients
should only send to the jobtracker the configuration they explicitly
set, then the job tracker would apply the defaults for all the other
configuration.
By doing this the cluster admin can modify things at any time and
changes on default values take effect for all clients without having
to distribute a new configuration to all clients.
IMO, this approach was the intended behavior at some point, according
to the Configuration.write(OutputStream) javadocs ' Writes non-default
properties in this configuration.'. But as the write method is writing
default properties this is not happening.
I'll keep an eye on that issue. I think a key problem right now is that
clients take their config from the configuration file in the core jar,
and from their own settings, You need to keep the settings in sync
somehow, and have to take what the core jar provides.
This approach would also get rid of the separate mechanism (zookeeper,
svn, etc) to keep clients synchronized as there would be no need to do
so.
zookeeper and similar are to keep the cluster alive; they shouldnt be
needed for clients, which should only need some URL of a job tracker to
talk to.