randr97 opened a new issue #16286: URL: https://github.com/apache/airflow/issues/16286
**Apache Airflow version**: 2.0.1 (Should apply to previous versions and later ones as well) **Environment**: - **Cloud provider or hardware configuration**: AWS, EC2 c5.xlarge - **OS** (e.g. from /etc/os-release): ubuntu 18.04 - **Kernel** (e.g. `uname -a`): Linux **What happened**: In `airflow.providers.sftp.hooks.sftp.SFTPHook`, when we try to download a file greater than 18 MiB, the download keeps happening forever and never gets completed. **What you expected to happen**: The download should have completed in seconds but did not. A file less than 18MiB gets downloaded in few seconds. Looks like this is an underlying issue in the `paramiko` library. Attaching a bunch of issues on paramiko's git and stackoverflow - 1. https://github.com/paramiko/paramiko/issues/926 2. https://stackoverflow.com/questions/12486623/paramiko-fails-to-download-large-files-1gb 3. https://stackoverflow.com/questions/3459071/paramiko-sftp-hangs-on-get **How to reproduce it**: 1. Create a large file size > 18MiB 2. Dump it in an SFTP server 3. Use airflow SFTPHook to download it 4. You should be able to see the task run forever **Anything else we need to know**: I after exploring found a solution to the problem and have fixed it in my project but if the community can dive deep it would be great. Link to the solution is - https://gist.github.com/vznncv/cb454c21d901438cc228916fbe6f070f This gist is by @vznncv and credits to him for coming up with a solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
