kaxil commented on PR #53149:
URL: https://github.com/apache/airflow/pull/53149#issuecomment-3089469086
I tried out both approaches, comparing the symlink approach (PR #53417) with
this vendoring approach. Here are my findings, detailed in the following
AI-assisted report:
---
While symlinks initially seemed appealing for developer experience, my
testing revealed a **fundamental incompatibility** with Python packaging:
```bash
# Direct wheel build works (misleading!)
$ hatch build -t wheel # ✅ Success - symlinks are resolved
# Standard build process fails
$ hatch build # ❌ Fails at wheel stage
FileNotFoundError: [Errno 2] No such file or directory:
'/opt/airflow/task-sdk/src/airflow/sdk/timezone.py'
```
**Why this happens:**
1. `hatch build` creates an sdist first, which preserves symlinks as-is
2. When building a wheel from the sdist in an isolated environment, symlink
targets don't exist
3. The build fails completely
This is consistent across all standard Python packaging tools (setuptools,
hatchling, flit) and makes symlinks unsuitable for production use unless we
move/resolve the symlinks in our `breeze prepare-airflow-distributions` command
or in custom hatch script.
I recognize that using vendoring for **internal code sharing** is
unconventional - tools like `vendoring` were designed for external dependencies
(like how pip vendors `requests`, `urllib3`, etc.). However, given Python's
packaging limitations, it's still our best option:
✅ **Packaging Compatibility**: Works with all standard Python build tools
✅ **True Independence**: Each package is self-contained with its
dependencies
✅ **Community Precedent**: Successfully used by pip, setuptools (though for
external deps)
✅ **No Custom Build Tools**: Uses standard Python packaging ecosystem
✅ **Clear in Code Reviews**: The `_vendor` path makes shared code obvious
Instead of `_vendor` (which implies external dependencies), we could use
`_shared` - clearer that it's internal shared code and looks like we have
alignment there already.
Despite being unconventional, vendoring remains the best approach because:
- Technical constraints haven't changed (symlinks still fail)
- No better alternative exists in Python's packaging ecosystem
- It works reliably and determinstic, even if unusual
- Sometimes novel problems require novel solutions
The key is clear documentation explaining why we're using this pattern and
how it differs from traditional vendoring.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]