On Tue, May 9, 2023 at 3:14 PM Ruediger Pluem <rpl...@apache.org> wrote: > > > > On 5/9/23 8:01 PM, Eric Covener wrote: > > On Tue, May 9, 2023 at 11:51 AM Ruediger Pluem <rpl...@apache.org> wrote: > >> > >> > >> > >> On 5/9/23 4:33 PM, Yann Ylavic wrote: > >>> On Tue, May 9, 2023 at 2:10 PM Yann Ylavic <ylavic....@gmail.com> wrote: > >>>> > >>>> On Tue, May 9, 2023 at 12:55 PM Ruediger Pluem <rpl...@apache.org> wrote: > >>>>> > >>>>> On 5/9/23 12:16 PM, Eric Covener wrote: > >>>>>> Still getting feedback in the PR about breakage. Any thoughts on > >>>>>> options here, like allowing spaces or encoding rather than failing? > >>>>> > >>>>> Allowing spaces is out of question for me as it creates an invalid > >>>>> request and opens the door to response splitting. At best we > >>>>> could encode automatically. It is really a good question if we could > >>>>> not make BCTLS the default. > >>>> > >>>> BCTLS by default looks fine to me, it changes the behaviour for users > >>>> that (used to) expect/handle decoded spaces in the query-string in > >>>> their scripts, but it's blocked now anyway.. > >>> > >>> Hm, actually we don't really know where the backref is placed (either > >>> in the uri-path or in the query-string), so escaping unconditionally > > Good point. Hence I think the only options left are that people fix their > configurations or > that we scan r->args and encode all spaces. But this leaves the question > open: Encode to what? '+' or %20? > Maybe we can put this job somehow in splitout_queryargs? It would have access > to the RewriteRule flags and > we could decide based on if RULEFLAG_ESCAPENOPLUS is set or not. > > Something along the following (partly pseudocode, not optimized): > > Index: modules/mappers/mod_rewrite.c > =================================================================== > --- modules/mappers/mod_rewrite.c (revision 1909373) > +++ modules/mappers/mod_rewrite.c (working copy) > @@ -854,6 +854,31 @@ > } > } > > + if (global_option_encode_spaces) { > + if (flags & RULEFLAG_ESCAPENOPLUS) { > + char *p1, p2; > + > + p1 = apr_palloc(r->pool, strlen(r->args) * 3 + 1); > + for (p2 = r->args; *p2; p2++) { > + if (*p2 == ' ') { > + strcpy(p1, "%20"); > + p1 += 3; > + } > + else { > + *p1++ = *p2; > + } > + } > + *p1 = '\0'; > + } > + else { > + for (char *pos = r->args; *pos; pos++) { > + if (*pos == ' ') { > + *pos = '+'; > + } > + } > + } > + } > + > rewritelog(r, 3, NULL, "split uri=%s -> uri=%s, args=%s", olduri, > r->filename, r->args ? r->args : "<none>"); > } > > >>> might lead to double-escaping in the uri-path. Maybe it's simpler to > >>> remove the check and leave it to mod_proxy only.. > >> > >> Do we have an example config where this currently breaks that does not > >> forward it to a proxy backend? > > > >>From some of the GH activity, I think there is some mod_proxy_fcgi > > sending to FPM and PHP. In this case the query is in an environment > > variable. > > > > Are spaces over proxy_http really that much of a smoking gun? Wouldn't > > any smuggling/splitting/desynch already be fatal if it's in the > > request line? It can't be terminated early if we only allow spaces. > > You mean if there are just spaces and no control characters? I am not sure if > you can do smuggling without \r and/or \n, but > the HTTP RFC is strict about what is allowed in the request URL and literal > spaces are definitely not allowed. > If we allow them we send a non RFC compliant HTTP request to the backend. I > think we must not do this.
I was assuming the 403s are coming from mod_rewrite, not the proxy modules, and that we'd only change the former. But it's not 100% clear for the people following up in the PR. If it's true, the proxy modules would then still return an error before putting unencoded/decoded spaces into the query.