turbaszek opened a new pull request #12327:
URL: https://github.com/apache/airflow/pull/12327


   The init_on_load method used deserialize_value method which
   in case of custom XCom backends may perform requests to external
   services (for example downloading file from buckets).
   
   This is problematic because wherever we query XCom the resuest would be
   send (for example when listing XCom in webui). This PR proposes implementing
   orm_deserialize_value which allows overriding this behavior. By default
   we use BaseXCom.deserialize_value.
   
   closes: #12315
   
   I'm testing this with the following backend:
   ```py
   class GCSXComBackend(BaseXCom):
       PREFIX = "xcom_gs://"
       BUCKET_NAME = "airflow-xcom-backend"
   
       @staticmethod
       def serialize_value(value: Any):
           if isinstance(value, pd.DataFrame):
               hook = GCSHook()
               with NamedTemporaryFile("w+") as f:
                   object_name = "data_" + f.name.replace("/", "_")
                   value.to_csv(f.name)
                   f.flush()
                   hook.upload(
                       bucket_name=GCSXComBackend.BUCKET_NAME,
                       object_name=object_name,
                       filename=f.name
                   )
               value = GCSXComBackend.PREFIX + object_name
           return BaseXCom.serialize_value(value)
   
       @staticmethod
       def deserialize_value(result) -> Any:
           result = BaseXCom.deserialize_value(result)
           if isinstance(result, str) and 
result.startswith(GCSXComBackend.PREFIX):
               object_name = result.replace(GCSXComBackend.PREFIX, "")
               with GCSHook().provide_file(
                   bucket_name=GCSXComBackend.BUCKET_NAME,
                   object_name=object_name,
               ) as f:
                   f.flush()
                   result = pd.read_csv(f.name)
           return result
   ```
   
   And that's what I see in XCom table:
   <img width="1673" alt="Screenshot 2020-11-12 at 22 18 56" 
src="https://user-images.githubusercontent.com/9528307/98997809-6bfa6b00-2535-11eb-90b3-2b57985a4e66.png";>
   
   In logs I can see that the data is uploaded and downloaded as expected.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to