And one comment as well. I **just** tried to attempt to add Python 3.13 support (finally):
And of course - the binary wheels google-re2 released for Python 3.13 are not enough to be used on Debian Bookworm( 1) and the build process fails: 5.974 Built crcmod==1.7 6.196 × Failed to build `google-re2==1.1.20240702` 6.196 ├─▶ The build backend returned an error 6.196 ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit 6.196 status: 1) 6.196 6.196 [stdout] 6.196 running bdist_wheel 6.196 running build 6.196 running build_py 6.196 copying re2/__init__.py -> build/lib.linux-x86_64-cpython-313/re2 6.196 running build_ext 6.196 building '_re2' extension 6.196 g++ -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 6.196 -Wall -fPIC -I/root/.cache/uv/builds-v0/.tmp7L9INs/include 6.196 -I/usr/local/include/python3.13 -c _re2.cc -o 6.196 build/temp.linux-x86_64-cpython-313/_re2.o -fvisibility=hidden 6.196 6.196 [stderr] 6.196 _re2.cc:15:10: fatal error: absl/strings/string_view.h: No such file 6.196 or directory 6.196 15 | #include "absl/strings/string_view.h" 6.196 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6.196 compilation terminated. 6.196 error: command '/usr/bin/g++' failed with exit code 1 6.196 6.196 hint: This usually indicates a problem with the package or the build 6.196 environment. 6.196 help: `google-re2` (v1.1.20240702) was included because `apache-airflow` 6.196 (v3.0.0.dev0) depends on `google-re2` ------ ERROR: failed to solve: process "/bin/bash -o pipefail -o errexit -o nounset -o nolog -c bash /scripts/docker/install_airflow.sh" did not complete successfully: exit code: 1 So - as far as I am concerned - google-re2 should go away On Wed, Feb 19, 2025 at 10:26 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Hello Here, > > I wanted to discuss something that bothered me for a while and I think > makes a good case for a small breaking change in our public interfaces for > Airflow 3 a slack conversation from today (where re2 was evidently crashing > due to some long regexp passed to it) triggered me to call for action. > > I would like to propose that we get rid of regexp in all the places where > we have user controllable input where user can supply a regexp (for example > in REST APIs, APIs used by UI, CLI, operator APIs) and other places where > the user can provide their own regexp now. And that should allow us to get > rid of google-re2 as a dependency (google-re2 is a somewhat problematic > dependency to have). > > *A bit of context:* > > We've replaced stdlib re usage with re2 in June 2023 as a result of a > security report we got. In one (and likely few other API calls of ours) we > used parameter that we passed (root) as regex - and that parameter was > controllable by the user - and they could potentially cause ReDOS > https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS > issue (i.e. crash our webserver by providing a very small carefully crafted > regexp in root parameter). google-re2 is (theoretically) protecting > about this kind of issues b y limiting regexp supported and using c regexp > library written by google that is faster. > > So as a result - as we were not sure if there are other cases (it was not > really obvious that "root" passed message is then parsed as regexp - we > used re2 everywhere in airflow, mandated use of re2 and added pre-commit > checking it. For those who have access to our security mailing list > https://lists.apache.org/thread/3mjt86r7djzdg5t22tby431t9cvg8drl is the > relevant thread. > > *Problem*: > > That caused a number of problems: > * google-re2 is binary package and it does not have binary wheel for newer > Python versions and some platforms - and it has to be compiled on those, > and it often fails, we had for example problem with installing airflow on > conda > > * https://github.com/apache/airflow/discussions/32852 > * > https://stackoverflow.com/questions/76701323/trying-to-download-apache-airflow-with-pip-but-error-pops-up-when-building-whee > * https://github.com/apache/airflow/issues/32849 > * > https://erogluegemen.medium.com/how-to-install-apache-airflow-on-mac-df9bd5cf1ff8 > * > https://stackoverflow.com/questions/77575842/unable-to-install-google-re2-on-macos-14-1-1 > > And the list goes on-and-on. Actually in my "Beach cleaning" review of > project - re2 is high on the list to "forego" - also because it's written > in C, and while it is developed in google, it might be that there are > similar regexp vulnerabilities as the original re vulnerabilities > > *Proposed solution:* > > I think a better approach would be to make sure that we never, ever parse > a string passed by a user as regexp. This means that we remove all the > regex-parsed user-controlled parameters in Airflow 3 (including REST API, > CLI, the APIs used by the UI, Task SDK if it's there, parameters of the > operators etc.). But also that means that we should add mechanisms > (automation/pre-commit. code review awareness with maintainers) - to > prevent adding such cases in the future. > > Then we could just get rid of re2 and switch to stdlib re - if we control > all the regexp, it's safe to use. > > I have not done an inventory of those usages, but there are a few places > where regexp can be provided from a client side (and maybe even fewer or > none) in the new UI. We have literally a few CLI commands in Airflow CLI > that take regex as input (dag and task command) - and there are probably a > few more places we will have to look at. In all those cases - I THINK - we > should be able to use other, safer pattern types (`glob` for one is a good > alternative in many cases and it's also a simpler one to grasp even if less > flexible). > > That means "breaking change" for all those usages, but one that is rather > unlikely to cause a great problem and if we provide globs or other patterns > as replacement, it should be quite easy to handle by our users. > > So my recommendation would be to: > * remove and add protection so that we do not use regexp for any user input > * remove google-re2 as dependency > > BTW. This s a pure embodiment of the famous saying "If you have a problem, > introduce a regexp, now you have two problems". > > > WDYT? > > J. > > > > > > > > > > > > > > > >