fapifta commented on a change in pull request #3658: URL: https://github.com/apache/hadoop/pull/3658#discussion_r751174995
########## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java ########## @@ -1679,11 +1679,13 @@ private Connection getConnection(ConnectionId remoteId, private final boolean doPing; //do we need to send ping message private final int pingInterval; // how often sends ping to the server in msecs private String saslQop; // here for testing + private final AtomicBoolean fallbackToSimpleAuth; private final Configuration conf; // used to get the expected kerberos principal name ConnectionId(InetSocketAddress address, Class<?> protocol, UserGroupInformation ticket, int rpcTimeout, - RetryPolicy connectionRetryPolicy, Configuration conf) { + RetryPolicy connectionRetryPolicy, Configuration conf, + AtomicBoolean fallbackToSimpleAuth) { Review comment: Here is the problem again with other words, as I am starting to feel that we don't understand each other :) When a client code initiates a connection to HDFS, the following decisions are made: - if the DfsClient is cached, then we will have the same DfsClient with the same AtomicBoolean to all connections (this is the default) - if the DfsClient is not cached, or if we explicitly create a new instance without checking the cache (we have API for this) then there will be as many distinct DfsClient with as many distinct AtomicBoolean as many are created. With the distinct AtomicBooleans, there are distinct ipc connection ids, which means there are distinct connections. Ipc Connections are closed when idle for 10 seconds (by default). So if we have the AtomicBoolean reference in the ConnectionId, we might create connections if distinct DfsClients are created (DfsClients are not cached, or via explicit call to create a distinct instance), which are then either used by the different clients, or closed after the max idle time. Note that the AtomicBoolean value is not there to determine if we are allowed to fall back, that comes from configuration via a variable fallbackAllowed into Connection#setupIOStreams. This atomicBoolean determines during the real communication which type of authentication will be used, and we do only know if we really have to fall back to simple auth after we connected to the server address and discovered its expected authentication method. As I stated earlier, we can not use the boolean value of the AtomicBoolean in the key. Here is what happens when we do: Let's say we have DfsClient1 and DfsClient2. DfsClient1 does 2 RPC calls to the same address then it is not closed, but DfsClient2 is created in the meantime which does 1 RPC call to the same address. DfsClient1 creates ConnectionID1 with boolean value false. DfsClient1 realizes it should fall back to simple auth, so it sets AtomicBoolean1 to true DfsClient1 sends the first RPC via the connection that has ConnectionID1 DfsClient1 tries to send the second RPC, and in order to do so it creates a connection with ConnectionID2 where the boolean value is true DfsClient1 sends the second RPC via the connection that has ConnectionID2 DfsClient1 goes idle DfsClient2 is created with AtomicBoolean2 set to false DfsCllient2 initiates the RPC call, and gets the connection with ConnectionID1 which has the boolean value as false DfsClient2 does yield from setupIOStreams as the connection is already established, with that AtomicBoolean2 remains false DfsClient2 fails the RPC call with the same error as documented in HADOOP-17975 which we are trying to solve. While if the ConnectionID has the reference to the AtomicBoolean instead of the boolean value of it, then DfsClient1 does not create a second connection on the second RPC call with ConnectionID2, but DfsClient2 will create a second connection with its own AtomicBoolean, and then setupIOStreams is able to properly initialize AtomicBoolean2. >normally we will only have have one connection, only when we have a "true" value connection will the second connection be created. This might sound appealing, but it is not true, as initially we would always have a ConnectionID with the value false, and this false value might change during setting up the connection when it is already cached by the false boolean value. As we do not know the final value and can not know the final value until we haven't set up the sockets and initiated some communication to see what auth method the server side has we will never initiate a connection with a ConnectionID that contains true. Yes you are right in that, changing the ConnectionID to have a distinct connection per DfsClient for just this specific case, and create multiple connections for all the other cases is bad, but if we insists on having this property in the ConnectionID, then I do not see any way to avoid this. Alternatively as this seems to be problematic, we can switch back to my previous proposal, which does not increases the number of connections, but just sets the AtomicBoolean value in setupIOStreams always when it is called if the auth protocol is SASL, and server side auth method is SIMPLE, and the client runs with a secured configuration, and if fallback is allowed where these conditions are checked in this order lazily. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org