JDarDagran opened a new issue, #40971: URL: https://github.com/apache/airflow/issues/40971
### Apache Airflow Provider(s) openlineage ### Versions of Apache Airflow Providers _No response_ ### Apache Airflow version main branch ### Operating System PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" ### Deployment Other ### Deployment details _No response_ ### What happened In #39530 there was migration to v2 facets of OpenLineage client done. v2 introduces quite a lot of improvements but seems like it also brought some unseen result when changing from `@attr.s` to `@attrs.define` -> default value of `slots` were changed from `False` to `True`. Given PR changes parent classes of e.g. `AirflowJobFacet` to v2 version but it still has slots set to `False`. This leads into unwanted behaviour when pickling instances of the class. Given the example of `AirflowJobFacet` `__slots__` contain only `_deleted` (coming from parent class), therefore pickling fails on the attributes of the child class. Below example illustrates it well: ```python In [1]: import pickle, attrs In [2]: @attrs.define(slots=False) ...: class A(): ...: a: str ...: In [3]: @attrs.define(slots=True) ...: class B(): ...: b: str ...: In [4]: @attrs.define(slots=False) ...: class C(A): ...: c: str ...: In [5]: @attrs.define(slots=False) ...: class D(A): ...: d: str ...: In [6]: @attrs.define(slots=True) ...: class E(B): ...: e: str ...: In [7]: @attrs.define(slots=False) ...: class F(B): ...: f: str ...: In [8]: def test(klazz): ...: try: ...: instance = pickle.loads(pickle.dumps(klazz(**{a.name: a.name for a in attrs.fields(klazz)}))) ...: for field in attrs.fields(klazz): ...: getattr(instance, field.name) ...: except AttributeError: ...: print(f"{klazz} failed to unpickle") ...: In [9]: test(A), test(B), test(C), test(D), test(E), test(F) <class '__main__.F'> failed to unpickle ``` This wasn't caught with unit tests as it is revealed only when using `ProcessPoolExecutor` from within `OpenLineageListener`. When dealing with objects between processes Python pickles them. ### What you think should happen instead For two reasons: 1. not to migrate to another set of facets in OL client that change slots from True to False 2. keeping slots in case of facets does not seem to have huge impact on performance I suggest we simply change `slots` argument to `True` for all facets used in dag run state listener hooks. ### How to reproduce Run breeze with `--integration openlineage` and OL provider installed from wheel. Run example DAG and check scheduler logs for error indicating pickling failure. ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
