[
https://issues.apache.org/jira/browse/HADOOP-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612019#action_12612019
]
Steve Loughran commented on HADOOP-3694:
----------------------------------------
Assuming this is the cause (and that same stack trace comes up, again and
again), this is what the code is trying to do
DataNode.startDataNode()
InetSocketAddress ipcAddr = NetUtils.createSocketAddr( // 1
conf.get("dfs.datanode.ipc.address"));
String hostname = ipcAddr.getHostName(); // 2
ipcServer = RPC.getServer(this, hostname, ipcAddr.getPort(), // 3
conf.getInt("dfs.datanode.handler.count", 3), false, conf);
(1) get socket address from the dfs.datanode.ipc.address, which defaults to
"0.0.0.0:50020"
(2) get the real hostname of the assigned socket
(3) open a server on this port.
Inside NetUtils.createSocketAddr, the configuration string is parsed and the
(hostname,port) values extracted. This hostname is then turned into a new
address.
1. If there is a static hostname -> hostname' mapping that is used
if (getStaticResolution(hostname) != null) {
hostname = getStaticResolution(hostname);
}
2. else the OS/JVM does the work, to work out the address
return new InetSocketAddress(hostname, port);
Somehow this is picking up an IPv6 address
Later, when ipcAddr.getHostName(); is called (in (2)), An attempt to rDNS this
address is made. Unless your site is running IPv6 DNS, this isnt going to
succeed, but you are going to take a 15-30s hit every time an attempt is made.
I'm going to see how to remove IPv6 from this machine, which has 1 real and two
virtual interfaces as well as loopback, to see if this will make the problem go
away, or at least make some mild improvements....
eth0 Link encap:Ethernet HWaddr 00:1C:C4:17:CC:46
inet addr:16.XX.XX.XXX Bcast:16.XX.XX.255 Mask:255.255.252.0
inet6 addr: fe80::21c:c4ff:fe17:cc46/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1561368 errors:0 dropped:0 overruns:0 frame:0
TX packets:12199689 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:325311436 (310.2 MB) TX bytes:2108807940 (1.9 GB)
Interrupt:17
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2538947 errors:0 dropped:0 overruns:0 frame:0
TX packets:2538947 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:881374392 (840.5 MB) TX bytes:881374392 (840.5 MB)
vmnet1 Link encap:Ethernet HWaddr 00:50:56:C0:00:01
inet addr:192.168.66.1 Bcast:192.168.66.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
vmnet8 Link encap:Ethernet HWaddr 00:50:56:C0:00:08
inet addr:192.168.142.1 Bcast:192.168.142.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> if MiniDFS startup time could be improved, testing time would be reduced
> ------------------------------------------------------------------------
>
> Key: HADOOP-3694
> URL: https://issues.apache.org/jira/browse/HADOOP-3694
> Project: Hadoop Core
> Issue Type: Improvement
> Components: test
> Affects Versions: 0.19.0
> Reporter: Steve Loughran
>
> Its taking me 140 minutes to run a test build; looking into the test results
> its the 20s startup delay of every MiniDFS cluster that is slowing things
> down. If we could find out why it is taking so long and cut it down, every
> test case that relied on a cluster would be speeded up.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.