should curl_url_get "normalize" URLs?

Daniel Stenberg via curl-library Thu, 27 Mar 2025 01:11:44 -0700

Hi team,

The curl_url_get man page [3] says it *normalizes* retrieved URLs. Normalizingin this context means that curl would do its best to return a singleconsistent representation of a URL even if you would provide differentvariations as input.

Normalizing helps apps to for example compare URLs or otherwise be moreconsistent.

This claim turned out to be false [1], as there are multiple details notnormalized in the latest libcurl version and I work on a PR [2] to address theshortcomings.

Normalizing URLs is less stragiht-forward than what it may sound. A naiveversion would decode every URL part, then encode them again and put together afull URL using all the re-encoded pieces.

This however would break URLs in multiple ways, as for example '/' would beencoded to %2F in the path part and '=' would be encoded into %3D in the querypart - so it can't be that done simple. Every part more or less has its ownset of properties and characters to take into account and treat specially. Notto mention that it is simply more work that requires several more memoryallocations to get done etc.

Also, a user might not need/want this normalization to get done. Maybe we needa flag to enable/disable?

Before I complete this work and risk wasting time going down the wrong rabithole, let me know if you have any thoughts, opinions or feedback on this area.


[1] = https://github.com/curl/curl/issues/16829
[2] = https://github.com/curl/curl/pull/16841
[3] = https://curl.se/libcurl/c/curl_url_get.html

--

 / daniel.haxx.se || https://rock-solid.curl.dev
--
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

should curl_url_get "normalize" URLs?

Reply via email to