On Wed, Nov 18, 2009 at 11:28 AM, Michael Thomas <[email protected]> wrote:
> IPs are passed to the rack awareness script. We use 'dig' to do the reverse
> lookup to find the hostname, as we also embed the rack id in the worker node
> hostnames.
>
> --Mike
>
> On 11/18/2009 08:20 AM, David J. O'Dell wrote:
>>
>> I'm trying to figure out if I should use ip addresses or dns names in my
>> rack awareness script.
>>
>> Its easier for me to use dns names because we have the row and rack
>> number in the name which means I can dynamically determine the rack
>> without having to manually update the list when adding nodes.
>>
>> However this won't work if the script is passed ips as arguments.
>> Does anyone know what is being passed on to the script(ip's or dns names)
>>
>> Relevant docs:
>>
>> http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html#Hadoop+Rack+Awareness
>>
>> and
>>
>> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)
>>
>>
>>
>
>
>
It was never clear to me what would be needed ip vs hostname. I
specified ip, short hostnames, and long hostnames just to be safe. And
you know things sometimes change with hadoop ::wink-wink::
I have been meaning to plug my topology script for a while (as I think
it is pretty cool). I separated my topology script and my topology
data like so..
topology.sh
HADOOP_CONF=/etc/hadoop/conf
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/topology.data
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/default-rack "
else
echo -n "$result "
fi
done
topology.data
hadoopdata1.ec.com /dc1/rack1
hadoopdata1 /dc1/rack1
10.1.1.1 /dc1/rack1
It is great if your hostname reflects the rackname in some parsable
format! Then you do not need to maintain a topology data file like I
have. As of now I generate it from our asset db.
Good luck!