potiuk commented on PR #53149: URL: https://github.com/apache/airflow/pull/53149#issuecomment-3090359190
> Cool, now the other thing that remains for parity is not being able to call another shared library from within an existing one. Example to be able to use timezone in "logging/structlog.py" file without using try..except etc Solved that too (sort of) :). Current version of fix is in https://github.com/apache/airflow/pull/53506 - with both problems solved (the second has limitations - but likely a reasonable one). First of all - there is no need to force-include for wheels. Only `sdist` force-include is needed. If you include it in wheels - it generates "duplicate" warnings and wheel building automatically follows symlinks and copies files. Now... The "include shared library in shared library also works. It has one limitation. How I've done thatt: * I symlinked "logging" library, in "airflow.sdk._shared" * then (inception!!!) I symlinked "timezone.py" from shared timezone in .... "shared/logging/src/airflow_logging/logging/_shared" * then in task-sdk I force-included logging to "_shared/logging" and timezone.py to "_shared/logging/_shared/timezone.py" Consequences: * in logging linrary you need to include timezone wiht "import ._shared.timezone" (so far straightforward and reasonable!) * limitation - you can only include the timezone ONCE... you cannot have timezone included separately (or at least I have not yet found equally easy way of doing it). * this means that if you want to use timezone from task-sdk you need to `import ._shared.logging._shared.timezone` - but ... this is not as unreasonable as you think - you probably do not want to have timezone included twice in two different places in your final distribution Limitation: * you cannot have "diamond" dependencies: ``` /--- logging --- \ timezone task-sdk \--- config -----/ ``` This is a **real limitation** that we would have to live with (unless we find a better way) -> but I **think** that might also help us to make much beetter cboices to make our modules as independent as possible and (yes I am repeating that ugly word - improve cyclomatic complexity). One of the properties of low-cyclomatic complexity is that the "thing" / "module" you have is either "being used" or is a "user" - and being both "used" and "user" is an indication of high cyclomatic complexity. In this case both logging and config are both users and being used - and I am prety sure in this particular case (for examle) structure similar to: ``` logging ---\ config ---- task-sdk timezone. ---/ ``` Is way better, and you could design both logging and config in the way that "task-sdk" would have to pass whatever they need from timezone from the "top" (basically inject dependency) But.. i can also think a bit harder and try to solve it differently if we think it is absolutely crucial. I think it would be great to see what kind of 'split" we would want to see and which modules we would like to have "shared" and then we could probab;y see how it can be mapped. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
