[ 
https://issues.apache.org/jira/browse/HADOOP-17975?focusedWorklogId=683159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-683159
 ]

ASF GitHub Bot logged work on HADOOP-17975:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Nov/21 08:37
            Start Date: 18/Nov/21 08:37
    Worklog Time Spent: 10m 
      Work Description: fapifta commented on a change in pull request #3658:
URL: https://github.com/apache/hadoop/pull/3658#discussion_r752011835



##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
##########
@@ -1679,11 +1679,13 @@ private Connection getConnection(ConnectionId remoteId,
     private final boolean doPing; //do we need to send ping message
     private final int pingInterval; // how often sends ping to the server in 
msecs
     private String saslQop; // here for testing
+    private final AtomicBoolean fallbackToSimpleAuth;
     private final Configuration conf; // used to get the expected kerberos 
principal name
     
     ConnectionId(InetSocketAddress address, Class<?> protocol, 
                  UserGroupInformation ticket, int rpcTimeout,
-                 RetryPolicy connectionRetryPolicy, Configuration conf) {
+                 RetryPolicy connectionRetryPolicy, Configuration conf,
+                 AtomicBoolean fallbackToSimpleAuth) {

Review comment:
       Let me quote myself here:
   >Note that the AtomicBoolean value is not there to determine if we are 
allowed to fall back, that comes from configuration via a variable 
fallbackAllowed into Connection#setupIOStreams. This atomicBoolean determines 
during the real communication which type of authentication will be used, and we 
do only know if we really have to fall back to simple auth after we connected 
to the server address and discovered its expected authentication method.
   
   Please see [this 
code](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L870),
 the falbackAllowed boolean is what you are talking about. It is initialized 
based on config 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1338).
   
   The AtomicBoolean is initialized in the [Connection's setupIOStreams 
method](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L808)
 which is called from [the getConnection 
method](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1614),
 which also caches the connection based on ConnectionID 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1637).
   
   The fallbackToSimpleAuth AtomicBoolean is created [in the DfsClient 
here](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1637),
 and is shared with the SaslDataTransferClient 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L1637),
 while it is provided to the ipc Client via the protocol proxy between this two 
points.
   
   Later on the SaslDataTransferClient uses the AtomicBoolean to decide wether 
to fallback or not [in this 
method](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java#L279).
   
   
   So again:
   the config we can not use, as even if the fallback is allowed we do not want 
to fallback against a secured server, and we do not know if the server is 
secure or unsecure until we are connected to it.
   The AtomicBoolean is initialized as false, and set to be true only after the 
Connection has been set up and cached based on the ConnectionID, so we can not 
rely on its boolean value.
   
   
   So the question still stands, do you still think we should go down this 
route, and tinker with the ConnectionID, instead of setting the AtomicBoolean's 
value in setupIOStreams if it should be true?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 683159)
    Time Spent: 8h 20m  (was: 8h 10m)

> Fallback to simple auth does not work for a secondary DistributedFileSystem 
> instance
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17975
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17975
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: István Fajth
>            Assignee: István Fajth
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> The following code snippet demonstrates what is necessary to cause a failure 
> in connection to a non secure cluster with fallback to SIMPLE auth allowed 
> from a secure cluster.
> {code:java}
>     Configuration conf = new Configuration();
>     conf.setBoolean("ipc.client.fallback-to-simple-auth-allowed", true);
>     URI fsUri = new URI("hdfs://<nn_uri>");
>     conf.setBoolean("fs.hdfs.impl.disable.cache", true);
>     FileSystem fs = FileSystem.get(fsUri, conf);
>     FSDataInputStream src = fs.open(new Path("/path/to/a/file"));
>     FileOutputStream dst = new FileOutputStream(File.createTempFile("foo", 
> "bar"));
>     IOUtils.copyBytes(src, dst, 1024);
>     // The issue happens even if we re-enable cache at this point
>     //conf.setBoolean("fs.hdfs.impl.disable.cache", false);
>     // The issue does not happen when we close the first FileSystem object
>     // before creating the second.
>     //fs.close();
>     FileSystem fs2 = FileSystem.get(fsUri, conf);
>     FSDataInputStream src2 = fs2.open(new Path("/path/to/a/file"));
>     FileOutputStream dst2 = new FileOutputStream(File.createTempFile("foo", 
> "bar"));
>     IOUtils.copyBytes(src2, dst2, 1024);
> {code}
> The problem is that when the DfsClient is created it creates an instance of 
> AtomicBoolean, which is propagated down into the IPC layer, where the 
> Client.Connection instance in setupIOStreams sets its value. This connection 
> object is cached and re-used to multiplex requests against the same DataNode.
> In case of creating a second DfsClient, the AtomicBoolean reference in the 
> client is a new AtomicBoolean, but the Client.Connection instance is the 
> same, and as it has a socket already open to the DataNode, it returns 
> immediatelly from setupIOStreams, leaving the fallbackToSimpleAuth 
> AtomicBoolean false as it is created in the DfsClient.
> This AtomicBoolean on the other hand controls how the SaslDataTransferClient 
> handles the connection in the above level, and with this value left on the 
> default false, the SaslDataTransferClient of the second DfsClient will not 
> fall back to SIMPLE authentication but will try to send a SASL handshake when 
> connecting to the DataNode.
>  
> The access to the FileSystem via the second DfsClient fails with exceptions 
> like the following one, then fails the read with a BlockMissingException like 
> below:
> {code}
> WARN hdfs.DFSClient: Failed to connect to /<dn_ip>:<dn_port> for file <file> 
> for block BP-531773307-<nn_ip>-1634685133591:blk_1073741826_1002, add to 
> deadNodes and continue. 
> java.io.EOFException: Unexpected EOF while trying to read response from server
>       at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:552)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:215)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:455)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
>       at 
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:648)
>       at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2980)
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822)
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747)
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:658)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:589)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:771)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
>       at java.io.DataInputStream.read(DataInputStream.java:100)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
>       at DfsClientTest3.main(DfsClientTest3.java:30)
> {code}
> {code}
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: 
> BP-813026743-<nn_ip>-1495248833293:blk_1139767762_66027405 file=/path/to/file
> {code}
>  
> The DataNode in the meantime logs the following:
> {code}
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> <dn_host>:<dn_port>:DataXceiver error processing unknown operation  src: 
> /<client_ip>:<client_port> dst: /<dn_ip>:<dn_port>
> java.io.IOException: Version Mismatch (Expected: 28, Received: -8531 )
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:70)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:222)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens only if the second client is connecting to the same DataNode as 
> the first one did, so might seem intermittent in case the clients are reading 
> different files, but happens always if the two client reads the same file 
> with replication factor 1.
> We ran into this issue during running HBase ExportSnapshot tool to move a 
> snapshot from a non-secure to a secure cluster, the issue is loosely related 
> to HBASE-12819 and HBASE-20433 and similar problems, I am linking these so 
> that HBase team will see how this is relevant for them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to