dabla commented on PR #37103:
URL: https://github.com/apache/airflow/pull/37103#issuecomment-1970989255

   > > Also added a DataToADLSOperator which uses the 
AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a 
previous task for example as we had the case) to a remote file without the need 
to create a local file first.
   > 
   > I am not sure if this is a valid use case (I think we had this discussion 
before on another operator). What exactly are we saving by having 
DataToADLSOperator? xcom data is small so it can't be performance(?)
   > 
   > Lets please remove this part from the PR as it's not related to the bug 
fix You can start a separated PR with adding the operator.
   
   Hello@eladkal ok I'll remove it but know that XCom's don't always have to be 
small.  Like in our case we have a custom XCom backend which stores large sets 
of data (like paged results of a REST API) on a PersistentVolumeClain, which 
means only the reference to the file is being stored in the Airflow database 
and the data itself resides on the filesystem, which is distributed accross the 
different workers.  That way we have operators which retrieves the data and 
store those results as XCom's, and the consecutive operator, here the 
DataToADLSOperator takes the result of that XCom and stores it on Fabric.  We 
could of course first write a python task which takes the XCom and writes it 
down as file before passing it to the Azure operator, but that would mean we 
need to invoke an additonal worker just to do that, while with this operator 
you can skip that additional step which also means less code to maintain and a 
more readable and concise DAG.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to