dabla commented on issue #47201:
URL: https://github.com/apache/airflow/issues/47201#issuecomment-2691361663

   > > > > Duplicate of 
[#47129](https://github.com/apache/airflow/issues/47129) ?
   > > > 
   > > > 
   > > > Yes. I think this will continue to break for the users depending upon 
how they are using it at the moment. For example we have a wrapper around 
SFTPHook that will allow us to do multiple interactions with SFTP server with 
one open connection. For instance, check if certain file matches the pattern, 
send the file to different server(and additionally create directory on the 
destination server) and then delete the file from source.
   > > > ```
   > > >     def my_file_transfer_function():
   > > >         srcConnectionWrapper = FTPConnection(source)
   > > >         destConnectionWrapper = FTPConnection(dest)
   > > >         srcConnectionWrapper.file_exsist(src_file_path)
   > > >         srcConnectionWrapper.sftp_client.getfo(src_file_path, 
destConnectionWrapper.sftp_client.open(
   > > >             dest_file_path, 'wb'))
   > > >         srcConnectionWrapper.delete_file(src_file_path)
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > So now we need to open three connections to source and 2 connections 
to destination server if we were to do the same using 5.1.0. I know this is 
very specific implementation but I am trying to say that we can have all kinds 
of implementation which could break if we change the get_conn() in sftphook
   > > 
   > > 
   > > The reason why this [PR](https://github.com/apache/airflow/pull/44247) 
changed the implementation was that we discovered connection leaking due to the 
caching of the connection in the SFTPHook. Also even if the connection is 
closed using the close_conn method of the SFTPHook, the underlying SSHClient 
isn't closed and caused connections leaks. In you example above, I would then 
opt to get the context managed connection and call the underlying SFTPClient 
methods directly.
   > 
   > While I understand the need for the change, I am just thinking if there 
could be any other way of providing this functionality instead of enforcing to 
change the existing implementation.
   > 
   > Does it really not close the underlying connection even if we call the 
close method that it is offering right now? Although i did not showcase in my 
code, we do use that method to close the connection once we are done with all 
the operations.
   > 
   > I mean If SFTPHook.close_conn which calls the underlying close function of 
paramiko then how adding a closing statement solves it? If I i am not wrong the 
closing function from contextlib will do the same thing right?
   
   The original close_conn only closes the SFTPClient, not the SSHClient, hence 
why the new implementation had 2 closing statements, one for SFTPClient and one 
for SSHClient.  Closing the SFTPClient doesn't close the SSHClient, which 
ultimately the FTP server we connected to was out of connections after a while. 
 We used the SFTPHook in multi threaded environment with lots of connections, 
which where leaking.  The PR I've made solved that issue, as we didn't 
experience connection leaking after the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to