Hello,

I wonder how do you guys see the problem of automatic node discovery: having, for instance, a couple of hadoops, with no configuration explicitly set whatsoever, simply discover each other and work together, like Gridgain does: just fire up two instances of the product, on the same machine or on different machines in the same LAN, they will use mulitcast or whatever to discover each other and to be a part of a self-discovered topology.

Of course, if you have special network requirements you should be able to specify undiscovarable nodes by IP or name but often grids are installed on LANs and it should really be simpler.

Namenodes are a bit different, they should use safer machines, I'm basically talking about datanodes here, but still I wonder how hard can it be to have self-assigned namenodes, maybe replicated automatically on several machines, unless one specific namenode is explicitly set via xml configuration.

Also, the ssh passwordless thing is so awkward. If you have a network of hadoop that mutually discover each other there is really no need for this passwordless ssh requirement. This is more of a system administrator aspect, if sysadmins want to automatically deploy or start a program on 5000 machines they often have the tools&skills to do that, it should not be a requirement.

What do you people think about this?

Best
Petru

Reply via email to