potiuk commented on a change in pull request #12466:
URL: https://github.com/apache/airflow/pull/12466#discussion_r531563105
##########
File path: dev/provider_packages/SETUP_TEMPLATE.py.jinja2
##########
@@ -59,19 +59,6 @@ def do_setup(version_suffix_for_pypi=''):
packages=find_namespace_packages(
include=['airflow.providers.{{ PROVIDER_PACKAGE_ID }}',
'airflow.providers.{{ PROVIDER_PACKAGE_ID }}.*']),
- package_data={ '': [
-{% if PROVIDER_PACKAGE_ID == 'amazon' %}
- "*.json",
-{% elif PROVIDER_PACKAGE_ID == 'cncf.kubernetes' %}
- "*.yaml",
-{% elif PROVIDER_PACKAGE_ID == 'google' %}
- "*.yaml",
- "*.sql",
-{% elif PROVIDER_PACKAGE_ID == 'papermill' %}
- "*.ipynb",
-{% endif %}
- ],
- },
Review comment:
Are you sure of that? Isn't the MANIFEST mechanism enough for that?
https://github.com/apache/airflow/commit/765cbbcd76900fd0777730aaba637058b089ac95#diff-4aee44fe1bb835097b401e18b813bf5181552d25c06a701afa60f9aec4530380R19
```
{% if PROVIDER_PACKAGE_ID == 'amazon' %}
include airflow/providers/amazon/aws/hooks/batch_waiters.json
{% elif PROVIDER_PACKAGE_ID == 'cncf.kubernetes' %}
include
airflow/providers/cncf/kubernetes/example_dags/example_spark_kubernetes_spark_pi.yaml
{% elif PROVIDER_PACKAGE_ID == 'google' %}
include airflow/providers/google/cloud/example_dags/*.yaml
include airflow/providers/google/cloud/example_dags/*.sql
{% elif PROVIDER_PACKAGE_ID == 'papermill' %}
include airflow/providers/papermill/example_dags/*.ipynb
{% endif %}
```
Or does it have to be duplicated for some reason? Is manifest only for
sdist and package_data here for whl? Do I guess correctly ?
It would contradict the documentation though - from
https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html it looks
like MANIFEST.in allows to specify includes alone, and package_data is an
optional counterpart that allows more fine-grain control.
> Setuptools offers three ways to specify data files to be included in your
packages. First, you can simply use the include_package_data keyword, e.g.:
```
setup(
...
include_package_data=True
)
```
> This tells setuptools to install any data files it finds in your packages.
The data files must be specified via the distutils’ MANIFEST.in file.
> If you want finer-grained control over what files are included (for
example, if you have documentation files in your package directories and want
to exclude them from installation), then you can also use the package_data
keyword, e.g.:
Do I read that correctly that we only need includes in MANIFEST.in ? I think
I checked that the selected files in `google` and `cncf.kubernetes` were added
when I removed the package_data entry. But I will double check it again.
The version with both mechanisms cause pretty confusing behaviour and took
me quite some time to discover why we have provider.yaml in the google and
cncf.kubernetes packages only.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]