[ https://issues.apache.org/jira/browse/HADOOP-17975?focusedWorklogId=685453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685453 ]
ASF GitHub Bot logged work on HADOOP-17975: ------------------------------------------- Author: ASF GitHub Bot Created on: 23/Nov/21 18:32 Start Date: 23/Nov/21 18:32 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3579: URL: https://github.com/apache/hadoop/pull/3579#issuecomment-976984820 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 1m 20s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 56s | | trunk passed | | +1 :green_heart: | compile | 29m 29s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 23m 19s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 59s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 6s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 5s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 23m 1s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 23m 1s | | the patch passed | | +1 :green_heart: | compile | 20m 14s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 59s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 3s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 54s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 17m 56s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 222m 21s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3579/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3579 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux a1b18c7bfebd 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 94cc49efc24cff8e2f0a5c01a7bce17430c4f727 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3579/12/testReport/ | | Max. process+thread count | 3143 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3579/12/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 685453) Time Spent: 10h (was: 9h 50m) > Fallback to simple auth does not work for a secondary DistributedFileSystem > instance > ------------------------------------------------------------------------------------ > > Key: HADOOP-17975 > URL: https://issues.apache.org/jira/browse/HADOOP-17975 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Reporter: István Fajth > Assignee: István Fajth > Priority: Major > Labels: pull-request-available > Time Spent: 10h > Remaining Estimate: 0h > > The following code snippet demonstrates what is necessary to cause a failure > in connection to a non secure cluster with fallback to SIMPLE auth allowed > from a secure cluster. > {code:java} > Configuration conf = new Configuration(); > conf.setBoolean("ipc.client.fallback-to-simple-auth-allowed", true); > URI fsUri = new URI("hdfs://<nn_uri>"); > conf.setBoolean("fs.hdfs.impl.disable.cache", true); > FileSystem fs = FileSystem.get(fsUri, conf); > FSDataInputStream src = fs.open(new Path("/path/to/a/file")); > FileOutputStream dst = new FileOutputStream(File.createTempFile("foo", > "bar")); > IOUtils.copyBytes(src, dst, 1024); > // The issue happens even if we re-enable cache at this point > //conf.setBoolean("fs.hdfs.impl.disable.cache", false); > // The issue does not happen when we close the first FileSystem object > // before creating the second. > //fs.close(); > FileSystem fs2 = FileSystem.get(fsUri, conf); > FSDataInputStream src2 = fs2.open(new Path("/path/to/a/file")); > FileOutputStream dst2 = new FileOutputStream(File.createTempFile("foo", > "bar")); > IOUtils.copyBytes(src2, dst2, 1024); > {code} > The problem is that when the DfsClient is created it creates an instance of > AtomicBoolean, which is propagated down into the IPC layer, where the > Client.Connection instance in setupIOStreams sets its value. This connection > object is cached and re-used to multiplex requests against the same DataNode. > In case of creating a second DfsClient, the AtomicBoolean reference in the > client is a new AtomicBoolean, but the Client.Connection instance is the > same, and as it has a socket already open to the DataNode, it returns > immediatelly from setupIOStreams, leaving the fallbackToSimpleAuth > AtomicBoolean false as it is created in the DfsClient. > This AtomicBoolean on the other hand controls how the SaslDataTransferClient > handles the connection in the above level, and with this value left on the > default false, the SaslDataTransferClient of the second DfsClient will not > fall back to SIMPLE authentication but will try to send a SASL handshake when > connecting to the DataNode. > > The access to the FileSystem via the second DfsClient fails with exceptions > like the following one, then fails the read with a BlockMissingException like > below: > {code} > WARN hdfs.DFSClient: Failed to connect to /<dn_ip>:<dn_port> for file <file> > for block BP-531773307-<nn_ip>-1634685133591:blk_1073741826_1002, add to > deadNodes and continue. > java.io.EOFException: Unexpected EOF while trying to read response from server > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:552) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:215) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:455) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:648) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2980) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:658) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:589) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:771) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94) > at DfsClientTest3.main(DfsClientTest3.java:30) > {code} > {code} > org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: > BP-813026743-<nn_ip>-1495248833293:blk_1139767762_66027405 file=/path/to/file > {code} > > The DataNode in the meantime logs the following: > {code} > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > <dn_host>:<dn_port>:DataXceiver error processing unknown operation src: > /<client_ip>:<client_port> dst: /<dn_ip>:<dn_port> > java.io.IOException: Version Mismatch (Expected: 28, Received: -8531 ) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:70) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:222) > at java.lang.Thread.run(Thread.java:748) > {code} > This happens only if the second client is connecting to the same DataNode as > the first one did, so might seem intermittent in case the clients are reading > different files, but happens always if the two client reads the same file > with replication factor 1. > We ran into this issue during running HBase ExportSnapshot tool to move a > snapshot from a non-secure to a secure cluster, the issue is loosely related > to HBASE-12819 and HBASE-20433 and similar problems, I am linking these so > that HBase team will see how this is relevant for them. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org