[GitHub] [hadoop] fapifta commented on a change in pull request #3658: HADOOP-17975 Fallback to simple auth does not work for a secondary DistributedFileSystem instance

GitBox Wed, 17 Nov 2021 04:04:00 -0800


fapifta commented on a change in pull request #3658:
URL: https://github.com/apache/hadoop/pull/3658#discussion_r751174995




##########
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
##########
@@ -1679,11 +1679,13 @@ private Connection getConnection(ConnectionId remoteId,
     private final boolean doPing; //do we need to send ping message
     private final int pingInterval; // how often sends ping to the server in 
msecs
     private String saslQop; // here for testing
+    private final AtomicBoolean fallbackToSimpleAuth;
     private final Configuration conf; // used to get the expected kerberos 
principal name
     
     ConnectionId(InetSocketAddress address, Class<?> protocol, 
                  UserGroupInformation ticket, int rpcTimeout,
-                 RetryPolicy connectionRetryPolicy, Configuration conf) {
+                 RetryPolicy connectionRetryPolicy, Configuration conf,
+                 AtomicBoolean fallbackToSimpleAuth) {

Review comment:
       Here is the problem again with other words, as I am starting to feel 
that we don't understand each other :)
   
   When a client code initiates a connection to HDFS, the following decisions 
are made:
   - if the DfsClient is cached, then we will have the same DfsClient with the 
same AtomicBoolean to all connections (this is the default)
   - if the DfsClient is not cached, or if we explicitly create a new instance 
without checking the cache (we have API for this) then there will be as many 
distinct DfsClient with as many distinct AtomicBoolean as many are created.
   
   With the distinct AtomicBooleans, there are distinct ipc connection ids, 
which means there are distinct connections.
   Ipc Connections are closed when idle for 10 seconds (by default).
   
   So if we have the AtomicBoolean reference in the ConnectionId, we might 
create connections if distinct DfsClients are created (DfsClients are not 
cached, or via explicit call to create a distinct instance), which are then 
either used by the different clients, or closed after the max idle time.
   
   Note that the AtomicBoolean value is not there to determine if we are 
allowed to fall back, that comes from configuration via a variable 
fallbackAllowed into Connection#setupIOStreams. This atomicBoolean determines 
during the real communication which type of authentication will be used, and we 
do only know if we really have to fall back to simple auth after we connected 
to the server address and discovered its expected authentication method.
   
   
   As I stated earlier, we can not use the boolean value of the AtomicBoolean 
in the key. Here is what happens when we do:
   Let's say we have DfsClient1 and DfsClient2. DfsClient1 does 2 RPC calls to 
the same address then it is not closed, but DfsClient2 is created in the 
meantime which does 1 RPC call to the same address.
   DfsClient1 creates ConnectionID1 with boolean value false.
   DfsClient1 realizes it should fall back to simple auth, so it sets 
AtomicBoolean1 to true
   DfsClient1 sends the first RPC via the connection that has ConnectionID1
   DfsClient1 tries to send the second RPC, and in order to do so it creates a 
connection with ConnectionID2 where the boolean value is true
   DfsClient1 sends the second RPC via the connection that has ConnectionID2
   DfsClient1 goes idle
   DfsClient2 is created with AtomicBoolean2 set to false
   DfsCllient2 initiates the RPC call, and gets the connection with 
ConnectionID1 which has the boolean value as false
   DfsClient2 does yield from setupIOStreams as the connection is already 
established, with that AtomicBoolean2 remains false
   DfsClient2 fails the RPC call with the same error as documented in 
HADOOP-17975 which we are trying to solve.
   
   While if the ConnectionID has the reference to the AtomicBoolean instead of 
the boolean value of it, then DfsClient1 does not create a second connection on 
the second RPC call with ConnectionID2, but DfsClient2 will create a second 
connection with its own AtomicBoolean, and then setupIOStreams is able to 
properly initialize AtomicBoolean2.
   
   >normally we will only have have one connection, only when we have a "true" 
value connection will the second connection be created.
   This might sound appealing, but it is not true, as initially we would always 
have a ConnectionID with the value false, and this false value might change 
during setting up the connection when it is already cached by the false boolean 
value. As we do not know the final value and can not know the final value until 
we haven't set up the sockets and initiated some communication to see what auth 
method the server side has we will never initiate a connection with a 
ConnectionID that contains true. 
   
   
   Yes you are right in that, changing the ConnectionID to have a distinct 
connection per DfsClient for just this specific case, and create multiple 
connections for all the other cases is bad, but if we insists on having this 
property in the ConnectionID, then I do not see any way to avoid this.
   
   Alternatively as this seems to be problematic, we can switch back to my 
previous proposal, which does not increases the number of connections, but just 
sets the AtomicBoolean value in setupIOStreams always when it is called if the 
auth protocol is SASL, and server side auth method is SIMPLE, and the client 
runs with a secured configuration, and if fallback is allowed where these 
conditions are checked in this order lazily.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] fapifta commented on a change in pull request #3658: HADOOP-17975 Fallback to simple auth does not work for a secondary DistributedFileSystem instance

Reply via email to