Hi team,

The curl_url_get man page [3] says it *normalizes* retrieved URLs. Normalizing in this context means that curl would do its best to return a single consistent representation of a URL even if you would provide different variations as input.

Normalizing helps apps to for example compare URLs or otherwise be more consistent.

This claim turned out to be false [1], as there are multiple details not normalized in the latest libcurl version and I work on a PR [2] to address the shortcomings.

Normalizing URLs is less stragiht-forward than what it may sound. A naive version would decode every URL part, then encode them again and put together a full URL using all the re-encoded pieces.

This however would break URLs in multiple ways, as for example '/' would be encoded to %2F in the path part and '=' would be encoded into %3D in the query part - so it can't be that done simple. Every part more or less has its own set of properties and characters to take into account and treat specially. Not to mention that it is simply more work that requires several more memory allocations to get done etc.

Also, a user might not need/want this normalization to get done. Maybe we need a flag to enable/disable?

Before I complete this work and risk wasting time going down the wrong rabit hole, let me know if you have any thoughts, opinions or feedback on this area.

[1] = https://github.com/curl/curl/issues/16829
[2] = https://github.com/curl/curl/pull/16841
[3] = https://curl.se/libcurl/c/curl_url_get.html

--

 / daniel.haxx.se || https://rock-solid.curl.dev
--
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to