JDarDagran opened a new issue, #40971:
URL: https://github.com/apache/airflow/issues/40971

   ### Apache Airflow Provider(s)
   
   openlineage
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   main branch
   
   ### Operating System
   
   PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" 
VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian 
HOME_URL="https://www.debian.org/"; SUPPORT_URL="https://www.debian.org/support"; 
BUG_REPORT_URL="https://bugs.debian.org/";
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   In #39530 there was migration to v2 facets of OpenLineage client done. v2 
introduces quite a lot of improvements but seems like it also brought some 
unseen result when changing from `@attr.s` to `@attrs.define` -> default value 
of `slots` were changed from `False` to `True`.
   Given PR changes parent classes of e.g. `AirflowJobFacet` to v2 version but 
it still has slots set to `False`. This leads into unwanted behaviour when 
pickling instances of the class. Given the example of `AirflowJobFacet` 
`__slots__` contain only `_deleted` (coming from parent class), therefore 
pickling fails on the attributes of the child class.
   
   Below example illustrates it well:
   ```python
   In [1]: import pickle, attrs
   
   In [2]: @attrs.define(slots=False)
      ...: class A():
      ...:     a: str
      ...: 
   
   In [3]: @attrs.define(slots=True)
      ...: class B():
      ...:     b: str
      ...: 
   
   In [4]: @attrs.define(slots=False)
      ...: class C(A):
      ...:     c: str
      ...: 
   
   In [5]: @attrs.define(slots=False)
      ...: class D(A):
      ...:     d: str
      ...: 
   
   In [6]: @attrs.define(slots=True)
      ...: class E(B):
      ...:     e: str
      ...: 
   
   In [7]: @attrs.define(slots=False)
      ...: class F(B):
      ...:     f: str
      ...:
   
   In [8]: def test(klazz):
       ...:     try:
       ...:         instance = pickle.loads(pickle.dumps(klazz(**{a.name: 
a.name for a in attrs.fields(klazz)})))
       ...:         for field in attrs.fields(klazz):
       ...:             getattr(instance, field.name)
       ...:     except AttributeError:
       ...:         print(f"{klazz} failed to unpickle")
       ...:
   
   In [9]: test(A), test(B), test(C), test(D), test(E), test(F)
   <class '__main__.F'> failed to unpickle
   ```
   
   This wasn't caught with unit tests as it is revealed only when using 
`ProcessPoolExecutor` from within `OpenLineageListener`. When dealing with 
objects between processes Python pickles them.
   
   ### What you think should happen instead
   
   For two reasons:
   1. not to migrate to another set of facets in OL client that change slots 
from True to False
   2. keeping slots in case of facets does not seem to have huge impact on 
performance
   
   I suggest we simply change `slots` argument to `True` for all facets used in 
dag run state listener hooks.
   
   ### How to reproduce
   
   Run breeze with `--integration openlineage` and OL provider installed from 
wheel. Run example DAG and check scheduler logs for error indicating pickling 
failure.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to