Doesn't bother me. On Wed, Feb 19, 2025 at 2:55 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> And one comment as well. I **just** tried to attempt to add Python 3.13 > support (finally): > > And of course - the binary wheels google-re2 released for Python 3.13 are > not enough to be used on Debian Bookworm( 1) and the build process fails: > > 5.974 Built crcmod==1.7 > 6.196 × Failed to build `google-re2==1.1.20240702` > 6.196 ├─▶ The build backend returned an error > 6.196 ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed > (exit > 6.196 status: 1) > 6.196 > 6.196 [stdout] > 6.196 running bdist_wheel > 6.196 running build > 6.196 running build_py > 6.196 copying re2/__init__.py -> > build/lib.linux-x86_64-cpython-313/re2 > 6.196 running build_ext > 6.196 building '_re2' extension > 6.196 g++ -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 > 6.196 -Wall -fPIC -I/root/.cache/uv/builds-v0/.tmp7L9INs/include > 6.196 -I/usr/local/include/python3.13 -c _re2.cc -o > 6.196 build/temp.linux-x86_64-cpython-313/_re2.o -fvisibility=hidden > 6.196 > 6.196 [stderr] > 6.196 _re2.cc:15:10: fatal error: absl/strings/string_view.h: No such > file > 6.196 or directory > 6.196 15 | #include "absl/strings/string_view.h" > 6.196 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 6.196 compilation terminated. > 6.196 error: command '/usr/bin/g++' failed with exit code 1 > 6.196 > 6.196 hint: This usually indicates a problem with the package or the > build > 6.196 environment. > 6.196 help: `google-re2` (v1.1.20240702) was included because > `apache-airflow` > 6.196 (v3.0.0.dev0) depends on `google-re2` > ------ > ERROR: failed to solve: process "/bin/bash -o pipefail -o errexit -o > nounset -o nolog -c bash /scripts/docker/install_airflow.sh" did not > complete successfully: exit code: 1 > > So - as far as I am concerned - google-re2 should go away > > On Wed, Feb 19, 2025 at 10:26 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Hello Here, > > > > I wanted to discuss something that bothered me for a while and I think > > makes a good case for a small breaking change in our public interfaces > for > > Airflow 3 a slack conversation from today (where re2 was evidently > crashing > > due to some long regexp passed to it) triggered me to call for action. > > > > I would like to propose that we get rid of regexp in all the places where > > we have user controllable input where user can supply a regexp (for > example > > in REST APIs, APIs used by UI, CLI, operator APIs) and other places where > > the user can provide their own regexp now. And that should allow us to > get > > rid of google-re2 as a dependency (google-re2 is a somewhat problematic > > dependency to have). > > > > *A bit of context:* > > > > We've replaced stdlib re usage with re2 in June 2023 as a result of a > > security report we got. In one (and likely few other API calls of ours) > we > > used parameter that we passed (root) as regex - and that parameter was > > controllable by the user - and they could potentially cause ReDOS > > > https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS > > issue (i.e. crash our webserver by providing a very small carefully > crafted > > regexp in root parameter). google-re2 is (theoretically) protecting > > about this kind of issues b y limiting regexp supported and using c > regexp > > library written by google that is faster. > > > > So as a result - as we were not sure if there are other cases (it was not > > really obvious that "root" passed message is then parsed as regexp - we > > used re2 everywhere in airflow, mandated use of re2 and added pre-commit > > checking it. For those who have access to our security mailing list > > https://lists.apache.org/thread/3mjt86r7djzdg5t22tby431t9cvg8drl is the > > relevant thread. > > > > *Problem*: > > > > That caused a number of problems: > > * google-re2 is binary package and it does not have binary wheel for > newer > > Python versions and some platforms - and it has to be compiled on those, > > and it often fails, we had for example problem with installing airflow on > > conda > > > > * https://github.com/apache/airflow/discussions/32852 > > * > > > https://stackoverflow.com/questions/76701323/trying-to-download-apache-airflow-with-pip-but-error-pops-up-when-building-whee > > * https://github.com/apache/airflow/issues/32849 > > * > > > https://erogluegemen.medium.com/how-to-install-apache-airflow-on-mac-df9bd5cf1ff8 > > * > > > https://stackoverflow.com/questions/77575842/unable-to-install-google-re2-on-macos-14-1-1 > > > > And the list goes on-and-on. Actually in my "Beach cleaning" review of > > project - re2 is high on the list to "forego" - also because it's written > > in C, and while it is developed in google, it might be that there are > > similar regexp vulnerabilities as the original re vulnerabilities > > > > *Proposed solution:* > > > > I think a better approach would be to make sure that we never, ever parse > > a string passed by a user as regexp. This means that we remove all the > > regex-parsed user-controlled parameters in Airflow 3 (including REST API, > > CLI, the APIs used by the UI, Task SDK if it's there, parameters of the > > operators etc.). But also that means that we should add mechanisms > > (automation/pre-commit. code review awareness with maintainers) - to > > prevent adding such cases in the future. > > > > Then we could just get rid of re2 and switch to stdlib re - if we control > > all the regexp, it's safe to use. > > > > I have not done an inventory of those usages, but there are a few places > > where regexp can be provided from a client side (and maybe even fewer or > > none) in the new UI. We have literally a few CLI commands in Airflow CLI > > that take regex as input (dag and task command) - and there are > probably a > > few more places we will have to look at. In all those cases - I THINK - > we > > should be able to use other, safer pattern types (`glob` for one is a > good > > alternative in many cases and it's also a simpler one to grasp even if > less > > flexible). > > > > That means "breaking change" for all those usages, but one that is rather > > unlikely to cause a great problem and if we provide globs or other > patterns > > as replacement, it should be quite easy to handle by our users. > > > > So my recommendation would be to: > > * remove and add protection so that we do not use regexp for any user > input > > * remove google-re2 as dependency > > > > BTW. This s a pure embodiment of the famous saying "If you have a > problem, > > introduce a regexp, now you have two problems". > > > > > > WDYT? > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >