dabla commented on PR #37103: URL: https://github.com/apache/airflow/pull/37103#issuecomment-1970989255
> > Also added a DataToADLSOperator which uses the AzureDataLakeStorageV2Hook which allows uploading data (like from an XCOM of a previous task for example as we had the case) to a remote file without the need to create a local file first. > > I am not sure if this is a valid use case (I think we had this discussion before on another operator). What exactly are we saving by having DataToADLSOperator? xcom data is small so it can't be performance(?) > > Lets please remove this part from the PR as it's not related to the bug fix You can start a separated PR with adding the operator. Hello@eladkal ok I'll remove it but know that XCom's don't always have to be small. Like in our case we have a custom XCom backend which stores large sets of data (like paged results of a REST API) on a PersistentVolumeClain, which means only the reference to the file is being stored in the Airflow database and the data itself resides on the filesystem, which is distributed accross the different workers. That way we have operators which retrieves the data and store those results as XCom's, and the consecutive operator, here the DataToADLSOperator takes the result of that XCom and stores it on Fabric. We could of course first write a python task which takes the XCom and writes it down as file before passing it to the Azure operator, but that would mean we need to invoke an additonal worker just to do that, while with this operator you can skip that additional step which also means less code to maintain and a more readable and concise DAG. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
