amoghrajesh commented on code in PR #53149: URL: https://github.com/apache/airflow/pull/53149#discussion_r2224478791
########## shared/README.md: ########## @@ -0,0 +1,91 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Shared Python Code for Airflow Components + +This folder contains code that is shared across two or more of the Airflow distributions, such as airflow-core and task-sdk. + +## Be Thoughtful about what you add under here + +Not every piece of code that is used in two distributions should be automatically placed in one of the shared libraries, and sometimes "just duplicate it" is the right approach to take. For example if it's just a 5 or 10 line function and it's used in two places, it might be easier to future developers to understand if the function is in two places. + +There is no hard rules about what should or shouldn't be in these libraries, so try to apply your best judgement. + +## Mechanics of sharing + +The primary mechanism by which we share code is to use in-repo symlinks. This means that there is a single copy of the code exists in the repo (no need for a pre-commit to update other copies, resulting in larger diffs). Review Comment: ```suggestion The primary mechanism by which we share code is to use in-repo symlinks. This means that there is a single copy of the code existing in the repo (no need for a pre-commit to update other copies, resulting in larger diffs). ``` ########## shared/README.md: ########## @@ -0,0 +1,91 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Shared Python Code for Airflow Components + +This folder contains code that is shared across two or more of the Airflow distributions, such as airflow-core and task-sdk. + +## Be Thoughtful about what you add under here + +Not every piece of code that is used in two distributions should be automatically placed in one of the shared libraries, and sometimes "just duplicate it" is the right approach to take. For example if it's just a 5 or 10 line function and it's used in two places, it might be easier to future developers to understand if the function is in two places. + +There is no hard rules about what should or shouldn't be in these libraries, so try to apply your best judgement. + +## Mechanics of sharing + +The primary mechanism by which we share code is to use in-repo symlinks. This means that there is a single copy of the code exists in the repo (no need for a pre-commit to update other copies, resulting in larger diffs). + +The primary reason we are using this sharing approach, rather than the perhaps more traditional approach of depending on a shared distribution from PyPI is around compatibility. Lets say that we have a library to parse AirflowConfig (including all the sources, env, `_cmd` handling, etc etc.). If that code is used by two distributions that are installed in the same Python environment, then either we introduce another source of "version hell" or we have to make every change both backwards and forwards compatible. + +Instead by using an approach similar to vendoring, where each dist ships with a copy of the code of the version it know it works with it allows them to co-exist in the same env without problems, and also reduces the cognitive load on developers when making changes to the shared libs -- as long as it passes the CI it should be good (and we don't have to test an ever increasing matrix of versions to ensure compatibility.) + +### Shared libraries _must_ use relative imports for other shared libraries** + +The one caveat to this is is due to the side-effect that these shared modules are going to be imported from different locations (for example `airflow._shared.timezones.timezone` and `airflow.sdk._shared.timezones.timezone`) then any imports inside the shared code to other parts of shared libraries _must_ make use of relative imports, and possibly needs `# noqa: TID252` + +For example, to use the shared timezone library from another shared library, lets say `shared/logging/src/airflow_shared/logging/config.py` you would have + +```python +from ..timezones.timezone import is_naive # noqa: TID252 +``` + +This is different to the rest of the airflow codebase where relative imports are not allowed. + +### Shared libraries should be grouped one level beneath `airflow_shared/` + +In order to make both the relative imports above function and to provider simple structure, we want to arrange things in single named chunks, where the folder name is also the "shared library name". For example Review Comment: ```suggestion In order to make both the relative imports above function and to provide simple structure, we want to arrange things in single named chunks, where the folder name is also the "shared library name". For example ``` ########## task-sdk/src/airflow/sdk/timezone.py: ########## @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from __future__ import annotations Review Comment: The public api facade looks nice too. ########## shared/README.md: ########## @@ -0,0 +1,91 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Shared Python Code for Airflow Components + +This folder contains code that is shared across two or more of the Airflow distributions, such as airflow-core and task-sdk. + +## Be Thoughtful about what you add under here + +Not every piece of code that is used in two distributions should be automatically placed in one of the shared libraries, and sometimes "just duplicate it" is the right approach to take. For example if it's just a 5 or 10 line function and it's used in two places, it might be easier to future developers to understand if the function is in two places. + +There is no hard rules about what should or shouldn't be in these libraries, so try to apply your best judgement. + +## Mechanics of sharing + +The primary mechanism by which we share code is to use in-repo symlinks. This means that there is a single copy of the code exists in the repo (no need for a pre-commit to update other copies, resulting in larger diffs). + +The primary reason we are using this sharing approach, rather than the perhaps more traditional approach of depending on a shared distribution from PyPI is around compatibility. Lets say that we have a library to parse AirflowConfig (including all the sources, env, `_cmd` handling, etc etc.). If that code is used by two distributions that are installed in the same Python environment, then either we introduce another source of "version hell" or we have to make every change both backwards and forwards compatible. + +Instead by using an approach similar to vendoring, where each dist ships with a copy of the code of the version it know it works with it allows them to co-exist in the same env without problems, and also reduces the cognitive load on developers when making changes to the shared libs -- as long as it passes the CI it should be good (and we don't have to test an ever increasing matrix of versions to ensure compatibility.) + +### Shared libraries _must_ use relative imports for other shared libraries** + +The one caveat to this is is due to the side-effect that these shared modules are going to be imported from different locations (for example `airflow._shared.timezones.timezone` and `airflow.sdk._shared.timezones.timezone`) then any imports inside the shared code to other parts of shared libraries _must_ make use of relative imports, and possibly needs `# noqa: TID252` + +For example, to use the shared timezone library from another shared library, lets say `shared/logging/src/airflow_shared/logging/config.py` you would have + +```python +from ..timezones.timezone import is_naive # noqa: TID252 +``` + +This is different to the rest of the airflow codebase where relative imports are not allowed. Review Comment: This part could have a precommit. ########## shared/timezones/src/airflow_shared/timezones/timezone.py: ########## @@ -0,0 +1,318 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations Review Comment: Should we be adding an entry to deprecate the `airflow.utils.timezone` in `airflow-core/src/airflow/utils/__init__.py`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
