potiuk commented on a change in pull request #12466:
URL: https://github.com/apache/airflow/pull/12466#discussion_r531563105



##########
File path: dev/provider_packages/SETUP_TEMPLATE.py.jinja2
##########
@@ -59,19 +59,6 @@ def do_setup(version_suffix_for_pypi=''):
         packages=find_namespace_packages(
             include=['airflow.providers.{{ PROVIDER_PACKAGE_ID }}',
                      'airflow.providers.{{ PROVIDER_PACKAGE_ID }}.*']),
-        package_data={ '': [
-{% if PROVIDER_PACKAGE_ID == 'amazon' %}
-          "*.json",
-{% elif PROVIDER_PACKAGE_ID == 'cncf.kubernetes' %}
-          "*.yaml",
-{% elif PROVIDER_PACKAGE_ID == 'google' %}
-          "*.yaml",
-          "*.sql",
-{% elif PROVIDER_PACKAGE_ID == 'papermill' %}
-          "*.ipynb",
-{% endif %}
-            ],
-        },

Review comment:
       Are you sure of that? Isn't the MANIFEST mechanism enough for that? 
   
   
https://github.com/apache/airflow/commit/765cbbcd76900fd0777730aaba637058b089ac95#diff-4aee44fe1bb835097b401e18b813bf5181552d25c06a701afa60f9aec4530380R19
  
   
   ```
   {% if PROVIDER_PACKAGE_ID == 'amazon' %}
   include airflow/providers/amazon/aws/hooks/batch_waiters.json
   {% elif PROVIDER_PACKAGE_ID == 'cncf.kubernetes' %}
   include 
airflow/providers/cncf/kubernetes/example_dags/example_spark_kubernetes_spark_pi.yaml
   {% elif PROVIDER_PACKAGE_ID == 'google' %}
   include airflow/providers/google/cloud/example_dags/*.yaml
   include airflow/providers/google/cloud/example_dags/*.sql
   {% elif PROVIDER_PACKAGE_ID == 'papermill' %}
   include airflow/providers/papermill/example_dags/*.ipynb
   {% endif %}
   ```
   
   
   Or does it have to be duplicated for some reason?  Is manifest only for 
sdist and package_data here for whl? Do I guess correctly ? 
   
   It would contradict the documentation though - from 
https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html it looks 
like MANIFEST.in allows to specify includes alone, and package_data is an 
optional counterpart that allows more fine-grain control. 
   
   > Setuptools offers three ways to specify data files to be included in your 
packages. First, you can simply use the include_package_data keyword, e.g.:
   
   ```
   setup(
       ...
       include_package_data=True
   )
   ```
   > This tells setuptools to install any data files it finds in your packages. 
The data files must be specified via the distutils’ MANIFEST.in file.
   
   > If you want finer-grained control over what files are included (for 
example, if you have documentation files in your package directories and want 
to exclude them from installation), then you can also use the package_data 
keyword, e.g.:
   
   Do I read that correctly that we only need includes in MANIFEST.in ? I think 
I checked that the selected files in `google` and `cncf.kubernetes` were added 
when I removed the package_data  entry. But I will double check it again.
   
   The version with both mechanisms cause pretty confusing behaviour and took 
me quite some time to discover why we have provider.yaml in the google and 
cncf.kubernetes packages only.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to