[ 
https://issues.apache.org/jira/browse/HADOOP-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612019#action_12612019
 ] 

Steve Loughran commented on HADOOP-3694:
----------------------------------------

Assuming this is the cause (and that same stack trace comes up, again and 
again), this is what the code is trying to do

DataNode.startDataNode()

    
    InetSocketAddress ipcAddr = NetUtils.createSocketAddr(     // 1
        conf.get("dfs.datanode.ipc.address"));
    String hostname = ipcAddr.getHostName();                         // 2
    ipcServer = RPC.getServer(this, hostname, ipcAddr.getPort(),  // 3
        conf.getInt("dfs.datanode.handler.count", 3), false, conf);

(1) get socket address from the dfs.datanode.ipc.address, which defaults to 
"0.0.0.0:50020"
(2) get the real hostname of the assigned socket
(3) open a server on this port. 


Inside NetUtils.createSocketAddr, the configuration string is parsed and the 
(hostname,port) values extracted. This hostname is then turned into a new 
address. 

1. If there is a static hostname -> hostname' mapping that is used

    if (getStaticResolution(hostname) != null) {
      hostname = getStaticResolution(hostname);
    }

2. else the OS/JVM does the work, to work out the address
    return new InetSocketAddress(hostname, port);

Somehow this is picking up an IPv6 address

Later, when ipcAddr.getHostName(); is called (in (2)), An attempt to rDNS this 
address is made. Unless your site is running IPv6 DNS, this isnt going to 
succeed, but you are going to take a 15-30s hit every time an attempt is made.

I'm going to see how to remove IPv6 from this machine, which has 1 real and two 
virtual interfaces as well as loopback, to see if this will make the problem go 
away, or at least make some mild improvements....

eth0      Link encap:Ethernet  HWaddr 00:1C:C4:17:CC:46  
          inet addr:16.XX.XX.XXX Bcast:16.XX.XX.255  Mask:255.255.252.0
          inet6 addr: fe80::21c:c4ff:fe17:cc46/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1561368 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12199689 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:325311436 (310.2 MB)  TX bytes:2108807940 (1.9 GB)
          Interrupt:17 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2538947 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2538947 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:881374392 (840.5 MB)  TX bytes:881374392 (840.5 MB)

vmnet1    Link encap:Ethernet  HWaddr 00:50:56:C0:00:01  
          inet addr:192.168.66.1  Bcast:192.168.66.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

vmnet8    Link encap:Ethernet  HWaddr 00:50:56:C0:00:08  
          inet addr:192.168.142.1  Bcast:192.168.142.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)





> if MiniDFS startup time could be improved, testing time would be reduced
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3694
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3694
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: test
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>
> Its taking me 140 minutes to run a test build; looking into the test results 
> its the 20s startup delay of every MiniDFS cluster that is slowing things 
> down. If we could find out why it is taking so long and cut it down, every 
> test case that relied on a cluster would be speeded up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to