Taragolis commented on PR #29156: URL: https://github.com/apache/airflow/pull/29156#issuecomment-1404751961
> The proposed fix solves my problem. And looks like might add problems for others. See [Best Practices -> Top level Python Code](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code): ```text Specifically you should not run any database access, heavy computations and networking operations. ``` BTW did you try your solution by your own? --- > this file needs to be uploaded to the FTP server - instead of downloading the file, I want to use smart_open and pass the file handle as a parameter to the FTP provider. Still no idea how it would work without spawning dozen additional connection. Just use Hook inside task execution (if it has appropriate method for solve your issue), not a task method. ```python with DAG( dag_id="abstract_dag", start_date=pendulum.datetime(2022, 1, 1, tz='UTC'), schedule=None, catchup=False, tags=["abstract"] ): @task.docker(image="foo:bar-slim") def first_task(): # Instead of this comment should exist code which produce file to S3/Azure/GCS return uri @task def second_task(uri): with smart_open.open(uri) as fp: # BTW how you plan managed connections to S3/Azure/GCS? ftp_hook = FTPHook(...) ftp_hook.use_appropriate_method(...) second_task(first_task()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
