Doesn't bother me.

On Wed, Feb 19, 2025 at 2:55 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> And one comment as well. I **just** tried to attempt to add Python 3.13
> support (finally):
>
> And of course - the binary wheels google-re2 released for Python 3.13 are
> not enough to be used on Debian Bookworm( 1) and the build process fails:
>
> 5.974       Built crcmod==1.7
> 6.196   × Failed to build `google-re2==1.1.20240702`
> 6.196   ├─▶ The build backend returned an error
> 6.196   ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed
> (exit
> 6.196       status: 1)
> 6.196
> 6.196       [stdout]
> 6.196       running bdist_wheel
> 6.196       running build
> 6.196       running build_py
> 6.196       copying re2/__init__.py ->
> build/lib.linux-x86_64-cpython-313/re2
> 6.196       running build_ext
> 6.196       building '_re2' extension
> 6.196       g++ -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3
> 6.196       -Wall -fPIC -I/root/.cache/uv/builds-v0/.tmp7L9INs/include
> 6.196       -I/usr/local/include/python3.13 -c _re2.cc -o
> 6.196       build/temp.linux-x86_64-cpython-313/_re2.o -fvisibility=hidden
> 6.196
> 6.196       [stderr]
> 6.196       _re2.cc:15:10: fatal error: absl/strings/string_view.h: No such
> file
> 6.196       or directory
> 6.196          15 | #include "absl/strings/string_view.h"
> 6.196             |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 6.196       compilation terminated.
> 6.196       error: command '/usr/bin/g++' failed with exit code 1
> 6.196
> 6.196       hint: This usually indicates a problem with the package or the
> build
> 6.196       environment.
> 6.196   help: `google-re2` (v1.1.20240702) was included because
> `apache-airflow`
> 6.196         (v3.0.0.dev0) depends on `google-re2`
> ------
> ERROR: failed to solve: process "/bin/bash -o pipefail -o errexit -o
> nounset -o nolog -c bash /scripts/docker/install_airflow.sh" did not
> complete successfully: exit code: 1
>
> So - as far as I am concerned - google-re2 should go away
>
> On Wed, Feb 19, 2025 at 10:26 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Hello Here,
> >
> > I wanted to discuss something that bothered me for a while and I think
> > makes a good case for a small breaking change in our public interfaces
> for
> > Airflow 3 a slack conversation from today (where re2 was evidently
> crashing
> > due to some long regexp passed to it) triggered me to call for action.
> >
> > I would like to propose that we get rid of regexp in all the places where
> > we have user controllable input where user can supply a regexp (for
> example
> > in REST APIs, APIs used by UI, CLI, operator APIs) and other places where
> > the user can provide their own regexp now. And that should allow us to
> get
> > rid of google-re2 as a dependency (google-re2 is a somewhat problematic
> > dependency to have).
> >
> > *A bit of context:*
> >
> > We've replaced stdlib re usage with re2 in June 2023 as a result of a
> > security report we got. In one (and likely few other API calls of ours)
> we
> > used parameter that we passed (root) as regex - and that parameter was
> > controllable by the user - and they could potentially cause ReDOS
> >
> https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
> > issue (i.e. crash our webserver by providing a very small carefully
> crafted
> > regexp in root parameter). google-re2 is (theoretically) protecting
> > about this kind of issues b y limiting regexp supported and using c
> regexp
> > library written by google that is faster.
> >
> > So as a result - as we were not sure if there are other cases (it was not
> > really obvious that "root" passed message is then parsed as regexp - we
> > used re2 everywhere in airflow, mandated use of re2 and added pre-commit
> > checking it. For those who have access to our security mailing list
> > https://lists.apache.org/thread/3mjt86r7djzdg5t22tby431t9cvg8drl is the
> > relevant thread.
> >
> > *Problem*:
> >
> > That caused a number of problems:
> > * google-re2 is binary package and it does not have binary wheel for
> newer
> > Python versions and some platforms - and it has to be compiled on those,
> > and it often fails, we had for example problem with installing airflow on
> > conda
> >
> > * https://github.com/apache/airflow/discussions/32852
> > *
> >
> https://stackoverflow.com/questions/76701323/trying-to-download-apache-airflow-with-pip-but-error-pops-up-when-building-whee
> > * https://github.com/apache/airflow/issues/32849
> > *
> >
> https://erogluegemen.medium.com/how-to-install-apache-airflow-on-mac-df9bd5cf1ff8
> > *
> >
> https://stackoverflow.com/questions/77575842/unable-to-install-google-re2-on-macos-14-1-1
> >
> > And the list goes on-and-on. Actually in my "Beach cleaning" review of
> > project - re2 is high on the list to "forego" - also because it's written
> > in C, and while it is developed in google, it might be that there are
> > similar regexp vulnerabilities as the original re vulnerabilities
> >
> > *Proposed solution:*
> >
> > I think a better approach would be to make sure that we never, ever parse
> > a string passed by a user as regexp. This means that we remove all the
> > regex-parsed user-controlled parameters in Airflow 3 (including REST API,
> > CLI, the APIs used by the UI, Task SDK if it's there, parameters of the
> > operators etc.). But also that means that we should add mechanisms
> > (automation/pre-commit. code review awareness with maintainers) - to
> > prevent adding such cases in the future.
> >
> > Then we could just get rid of re2 and switch to stdlib re - if we control
> > all the regexp, it's safe to use.
> >
> > I have not done an inventory of those usages, but there are a few places
> > where regexp can be provided from a client side (and maybe even fewer or
> > none) in the new UI. We have literally a few CLI commands in Airflow CLI
> > that take regex as input (dag and task command)  - and there are
> probably a
> > few more places we will have to look at. In all those cases - I THINK -
> we
> > should be able to use other, safer pattern types (`glob` for one is a
> good
> > alternative in many cases and it's also a simpler one to grasp even if
> less
> > flexible).
> >
> > That means "breaking change" for all those usages, but one that is rather
> > unlikely to cause a great problem and if we provide globs or other
> patterns
> > as replacement, it should be quite easy to handle by our users.
> >
> > So my recommendation would be to:
> > * remove and add protection so that we do not use regexp for any user
> input
> > * remove google-re2 as dependency
> >
> > BTW. This s a pure embodiment of the famous saying "If you have a
> problem,
> > introduce a regexp, now you have two problems".
> >
> >
> > WDYT?
> >
> > J.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to