[ https://issues.apache.org/jira/browse/HADOOP-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586472#comment-14586472 ]
Sangjin Lee commented on HADOOP-12090: -------------------------------------- This is caused by fragmented TCP packets for the kerberos authentication request. In the problem situation, the kerberos authentication request sent by the client gets fragmented into 2 packets although the size is tiny (e.g. 584 bytes). It gets split into one packet with 570 bytes of data and another with 14 bytes in this case. Tcpdump output: {noformat} 10:30:32.358645 IP localhost.50199 > localhost.60538: Flags [S], seq 1804572222, win 32792, options [mss 16396,sackOK,TS val 566449661 ecr 0,nop,wscale 8], length 0 10:30:32.358661 IP localhost.60538 > localhost.50199: Flags [S.], seq 2381946627, ack 1804572223, win 1140, options [mss 16396,sackOK,TS val 566449661 ecr 566449661,nop,wscale 0], length 0 10:30:32.358672 IP localhost.50199 > localhost.60538: Flags [.], ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 0 10:30:32.358788 IP localhost.50199 > localhost.60538: Flags [.], seq 1:571, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 570 10:30:32.358796 IP localhost.60538 > localhost.50199: Flags [.], ack 571, win 570, options [nop,nop,TS val 566449661 ecr 566449661], length 0 10:30:32.358801 IP localhost.50199 > localhost.60538: Flags [P.], seq 571:585, ack 1, win 129, options [nop,nop,TS val 566449661 ecr 566449661], length 14 {noformat} It turns out there is a bug with apacheds (on which minikdc is based) where the kerberos message decoding fails with a NPE if the kerberos message is not contained in a single TCP packet (DIRSERVER-2071). Furthermore, the TCP fragmentation itself has something to do with apacheds as well. Mina, the underlying I/O framework for apacheds, sets a pretty small receive/send buffer size by default (1 KB). This has an affect of reducing the TCP window size significantly as it is evidenced by the tcp dump above. This is causing the fragmentation. > minikdc-related unit tests fail consistently on some platforms > -------------------------------------------------------------- > > Key: HADOOP-12090 > URL: https://issues.apache.org/jira/browse/HADOOP-12090 > Project: Hadoop Common > Issue Type: Bug > Components: kms > Affects Versions: 2.7.0 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > > On some platforms all unit tests that use minikdc fail consistently. Those > tests include TestKMS, TestSaslDataTransfer, > TestTimelineAuthenticationFilter, etc. > Typical failures on the unit tests: > {noformat} > java.lang.AssertionError: > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Cannot get a > KDC reply) > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1154) > at > org.apache.hadoop.crypto.key.kms.server.TestKMS$8$4.run(TestKMS.java:1145) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1645) > at > org.apache.hadoop.crypto.key.kms.server.TestKMS.doAs(TestKMS.java:261) > at > org.apache.hadoop.crypto.key.kms.server.TestKMS.access$100(TestKMS.java:76) > {noformat} > The errors that cause this failure on the KDC server on the minikdc are a > NullPointerException: > {noformat} > org.apache.mina.filter.codec.ProtocolDecoderException: > java.lang.NullPointerException: message (Hexdump: ...) > at > org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:234) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:48) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:802) > at > org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:120) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:434) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:426) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:604) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:564) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:553) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:57) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:892) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:65) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException: message > at > org.apache.mina.filter.codec.AbstractProtocolDecoderOutput.write(AbstractProtocolDecoderOutput.java:44) > at > org.apache.directory.server.kerberos.protocol.codec.MinaKerberosDecoder.decode(MinaKerberosDecoder.java:65) > at > org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:224) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)