Everything you said sounds good to me! Thank you! On Mon, Feb 21, 2022 at 11:08 PM Mat Trudel <m...@geeky.net> wrote:
> José, > > Very good. As you suggest, allowing for the manual creation of URI structs > is the only *strictly* required thing on my wish list - everything else can > be done externally. > > I will build out the validation & normalization logic in a standalone > library removed from Bandit, as I still do believe that the URI module is > the correct place for this logic. Perhaps we can revisit this once I’ve had > a chance to shake out the API structure & refined the various use cases. > > I’ll cut a PR against elixir-lang/elixir to update URI's documentation as > you suggest. > > Thanks again! > > m. > > > On Feb 21, 2022, at 3:54 PM, José Valim <jose.va...@gmail.com> wrote: > > I see, in your case then it sounds like you running your own custom > validation is the best, because URI can't provide it out of the box. So it > seems creating from the %URI{...} is the best option. We can document it is > possible but not to set the deprecated authority field. > > > *José Valimhttps://dashbit.co/ <https://dashbit.co/>* > > > On Mon, Feb 21, 2022 at 9:44 PM Mat Trudel <m...@geeky.net> wrote: > >> Jose, >> >> You’re correct insofar as the various components in an HTTP request all >> come from well defined sources (with the possible exception of determining >> the hostname of a request, which is a bit tricky). What isn’t so obvious, >> however, is how these may be combined by bad actors to create undesired >> request URIs. There are a number of attack vectors which can exploit server >> URI parsing as a basis for further downstream exploits (see [1], [2], [3]). >> >> My planned approach to manage this in Bandit is to build URIs is roughly >> as follows >> >> 1. Figure out the scheme used for the request - from the perspective of >> Bandit, this is either http or https depending on the underlying transport. >> Situations where this may be overridden by forwarding proxies including >> `X-` headers are explicitly outside the scope of Bandit; we’re only >> concerned about explicit HTTP semantics. >> >> 2. Determine the hostname & port used for the request (by consulting a >> specific list of sources in Host headers, authority pseudo headers, and >> other sources). Construct a URI from scheme, host & port & normalize it. >> Validate that the resulting path is “/“ and that the query string is empty. >> >> 3. Determine the path & query string from the request by analyzing the >> request line / path pseudo header. Construct a URI from this & normalize >> it. Validate that the resulting scheme, host & port are empty. >> >> 4. Merge these two URIs together resulting in one where all fields are >> known to come from specific sources as above. >> >> In truth I suspect that the full answer here is no doubt a lot longer >> more nuanced than I’m able to appreciate. My (possibly naive) hope here is >> to be able to apply some well-defined heuristics to build & normalize a >> request as early as possible in the request lifecycle, so as to ensure that >> Plug users can rely on their request parameters at least being valid & >> sanitized at a protocol level. >> >> In terms of specific validations, I would propose that each field be >> validated against the grammars defined in RFC 3986 [4]. Concerning >> normalization heuristics, a number are described in section 6 of the same >> RFC, though I can think of a few others which would likely be good to >> include. Specific normalization heuristics used should be called out in >> documentation. >> >> The question of whether we would want to expose validation and >> normalization as discrete functions against a URI isn’t one I have a strong >> opinion on. My hunch here is that there is probably a wide variety of >> expectations here varying on use cases so it’s probably better to leave >> them separate. >> >> m. >> >> >> [1] >> https://samcurry.net/abusing-http-path-normalization-and-cache-poisoning-to-steal-rocket-league-accounts/ >> [2] >> https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf >> [3] https://community.cloudflare.com/t/faq-url-normalization/259183 >> [4] https://datatracker.ietf.org/doc/html/rfc3986 >> >> >> On Feb 20, 2022, at 6:02 AM, José Valim <jose.va...@dashbit.co> wrote: >> >> Hi Mat, thanks for starting this discussion! >> >> Quick question: don't you want to normalize the URI? I assume they >> already have to follow a strict format in the HTTP case that is ready to >> use as is. So doing any sort of normalization would be additional work. We >> could perform some minimal validation but, if so, what should it be? >> >> >> On Fri, Feb 18, 2022 at 6:29 PM Mat Trudel <m...@geeky.net> wrote: >> >>> When implementing an HTTP server, one of the most unspecified parts of >>> handling a request is the building and canonicalization of the requested >>> URI. The constituent parts of a request URI are spread out across multiple >>> sources. For example, the hostname of a request can be any of (possibly >>> multiple!) Host header(s), an authority pseudo-header in HTTP/2, a >>> statically configured value for IP-based hosting, or even something derived >>> from upstream X- headers. Assembling these parts into a canonical request >>> URI is non-trivial. >>> >>> The URI module as currently implemented does not provide supported ways >>> to construct a URI from constituent parts (though that is changing [1] ). >>> Nor does it provide methods to validate or meaningfully normalize an >>> extant URI struct. Without these methods, HTTP servers need to resort to >>> adhoc methods to build and canonicalize request URIs (see [2], [3]). >>> >>> To help alleviate this, it is proposed to add the following changes to >>> the URI module: >>> >>> 1. Explicitly allow for the building of URI structs directly in the >>> module documentation (subject to warnings about the use of the authority >>> field). >>> >>> 2. Add a normalize(%{})/2 function which will return a normalized >>> version of an existing URI struct (this can plumb through to >>> :uri_string.normalize/2 [4]). >>> >>> 3. Add an absolute?/1 function which returns whether or not the URI is >>> absolute (that is, does it contain sufficient information to discretely >>> represent a complete, unambiguous request) >>> >>> Along with the existing new/1 and merge/2 functions, I believe that this >>> should be sufficient to cleanly implement request URI construction within a >>> web server such as Bandit. This will allow the web server to determine >>> where to source the various components of a URI from, while deferring >>> assembly, normalization and validation of those components to the URI >>> module where it belongs. >>> >>> Subject to debate and approval I'm happy to work this up. >>> >>> m. >>> >>> [1] https://twitter.com/josevalim/status/1494208355732275200 >>> [2] >>> https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2/stream_task.ex#L101-L113 >>> [3] >>> https://github.com/ninenines/cowboy/blob/8795233c57f1f472781a22ffbf186ce38cc5b049/src/cowboy_http.erl#L490-L553 >>> [4] https://www.erlang.org/doc/man/uri_string.html#normalize-2 >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elixir-lang-core+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/8c4e9d5d-f83a-43dc-82e7-171730f19724n%40googlegroups.com >>> <https://groups.google.com/d/msgid/elixir-lang-core/8c4e9d5d-f83a-43dc-82e7-171730f19724n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elixir-lang-core" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> elixir-lang-core+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KcmuJNyOtc2DQ-LNuaMM1phMrpiHG7f2%3DP-3T2WrconQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KcmuJNyOtc2DQ-LNuaMM1phMrpiHG7f2%3DP-3T2WrconQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elixir-lang-core+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/25C16A74-ADC7-4C84-AEF2-387B91EBF262%40geeky.net >> <https://groups.google.com/d/msgid/elixir-lang-core/25C16A74-ADC7-4C84-AEF2-387B91EBF262%40geeky.net?utm_medium=email&utm_source=footer> >> . >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elixir-lang-core" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > elixir-lang-core+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bqvh%3DqyNMvBZ7bOfOCRVJV2rC5rYHFCVP-2G2xxaGUNQ%40mail.gmail.com > <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bqvh%3DqyNMvBZ7bOfOCRVJV2rC5rYHFCVP-2G2xxaGUNQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > > -- > You received this message because you are subscribed to the Google Groups > "elixir-lang-core" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elixir-lang-core+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elixir-lang-core/C1A59A3A-C143-435B-BEBA-DD5FAFD33BD5%40geeky.net > <https://groups.google.com/d/msgid/elixir-lang-core/C1A59A3A-C143-435B-BEBA-DD5FAFD33BD5%40geeky.net?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Beew6XZis3-RkpbVOxDQeJci822D420J-mdQ%2BesBuGFQ%40mail.gmail.com.