turbaszek opened a new pull request #12327:
URL: https://github.com/apache/airflow/pull/12327
The init_on_load method used deserialize_value method which
in case of custom XCom backends may perform requests to external
services (for example downloading file from buckets).
This is problematic because wherever we query XCom the resuest would be
send (for example when listing XCom in webui). This PR proposes implementing
orm_deserialize_value which allows overriding this behavior. By default
we use BaseXCom.deserialize_value.
closes: #12315
I'm testing this with the following backend:
```py
class GCSXComBackend(BaseXCom):
PREFIX = "xcom_gs://"
BUCKET_NAME = "airflow-xcom-backend"
@staticmethod
def serialize_value(value: Any):
if isinstance(value, pd.DataFrame):
hook = GCSHook()
with NamedTemporaryFile("w+") as f:
object_name = "data_" + f.name.replace("/", "_")
value.to_csv(f.name)
f.flush()
hook.upload(
bucket_name=GCSXComBackend.BUCKET_NAME,
object_name=object_name,
filename=f.name
)
value = GCSXComBackend.PREFIX + object_name
return BaseXCom.serialize_value(value)
@staticmethod
def deserialize_value(result) -> Any:
result = BaseXCom.deserialize_value(result)
if isinstance(result, str) and
result.startswith(GCSXComBackend.PREFIX):
object_name = result.replace(GCSXComBackend.PREFIX, "")
with GCSHook().provide_file(
bucket_name=GCSXComBackend.BUCKET_NAME,
object_name=object_name,
) as f:
f.flush()
result = pd.read_csv(f.name)
return result
```
And that's what I see in XCom table:
<img width="1673" alt="Screenshot 2020-11-12 at 22 18 56"
src="https://user-images.githubusercontent.com/9528307/98997809-6bfa6b00-2535-11eb-90b3-2b57985a4e66.png">
In logs I can see that the data is uploaded and downloaded as expected.
---
**^ Add meaningful description above**
Read the **[Pull Request
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
for more information.
In case of fundamental code change, Airflow Improvement Proposal
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party
License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]