Taragolis commented on PR #29156:
URL: https://github.com/apache/airflow/pull/29156#issuecomment-1404751961

   > The proposed fix solves my problem.
   
   And looks like might add problems for others. See [Best Practices -> Top 
level Python 
Code](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code):
   
   ```text
   Specifically you should not run any database access, heavy computations and 
networking operations.
   ```
   
   BTW did you try your solution by your own?
   
   ---
   
   > this file needs to be uploaded to the FTP server - instead of downloading 
the file, I want to use smart_open and pass the file handle as a parameter to 
the FTP provider.
   
   Still no idea how it would work without spawning dozen additional connection.
   
   
   Just use Hook inside task execution (if it has appropriate method for solve 
your issue), not a task method.
   
   ```python
   
   with DAG(
       dag_id="abstract_dag",
       start_date=pendulum.datetime(2022, 1, 1, tz='UTC'),
       schedule=None,
       catchup=False,
       tags=["abstract"]
   ):
       @task.docker(image="foo:bar-slim")
       def first_task():
           # Instead of this comment should exist code which produce file to 
S3/Azure/GCS
           return uri
   
       @task
       def second_task(uri):
           with smart_open.open(uri) as fp: # BTW how you plan managed 
connections to S3/Azure/GCS?
               ftp_hook = FTPHook(...)
               ftp_hook.use_appropriate_method(...)
   
       second_task(first_task())
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to