Hello HDFS Community! I have a question regarding the information about HDFS block locations. We are building a system on top of HDFS that tries to obey data locality rules, so we are eager to match block locations against machines.
However, when looking at the host names obtained through "BlockLocation#getHosts() ", it seems that the host names vary in format, depending on how the machines are set up. Sometimes, the host name contains the fully qualified domain name (such as "server1.hdfscluster.company.com") and sometimes it contains only the host name (such as "server1"). It seems to be neither consistent with the java methods "InetAddress#getHostName()" or "InetAddress#getCanonicalHostName()" Is there a general rule after which those names are derived? Thanks for your help, Stephan