Greetings from the Gothenburg BSP. This is the output I get when reproducing this issue:
---8<----->8----- --- FAIL: TestEncodeNecessaryEscapesAll (0.00s) purell_test.go:761: Got error parse http://host/ � net/url: invalid control character in URL FAIL exit status 1 FAIL _/tmp/golang-github-puerkitobio-purell-1.1.0 0.003s ---8<----->8----- net/url Parse now explicitly check that you don't pass in "invalid" (non-encoded) urls and rejects them: https://sources.debian.org/src/golang-1.11/1.11.6-1/src/net/url/url.go/?hl=498#L498 The NormalizeUrlString helper function starts out by passing the url string to url.Parse: https://sources.debian.org/src/golang-github-puerkitobio-purell/1.1.0-1/purell.go/#L153 The TestDecodeUnnecessaryEscapesAll function creates an url with control characters and passes it to NormalizeURLString to try to get it normalized/escaped. I'd say the design of the NormalizeUrlString function is broken and thus it's not obvious to me how to fix it. I'd say that if you have a non-normalized string you want encoded, you should pass it in as something that doesn't need to be parsed. ie. use NormalizeUrl helper function, which takes an Url type. Removing the NormalizeUrlString function however would be an API/ABI break as it's publicly exported (and used by Hugo and Kubernetes)... Maybe open-coding some homebrew url parsing to fall back on could be done to keep the function around..... Sounds bad to think you know better than net/url how to parse an url though. I'm attaching a patch which "fixes" this issue, but really it's most likely a stupid and wrong things to do to solve this. It's just designed to make the testsuite pass. As mentioned, I think the NormalizeURLString function is incorrectly designed and there's no way to fix it (so should be deprecated in favour of NormalizeURL). Thus intentionally *not* tagging patch, as this is rather a "proof of concept". Regards, Andreas Henriksson
diff -uriNp golang-github-puerkitobio-purell-1.1.0/purell.go golang-github-puerkitobio-purell-1.1.0-fixed/purell.go --- golang-github-puerkitobio-purell-1.1.0/purell.go 2016-11-15 03:49:42.000000000 +0100 +++ golang-github-puerkitobio-purell-1.1.0-fixed/purell.go 2019-04-07 16:00:06.039745498 +0200 @@ -12,6 +12,7 @@ import ( "sort" "strconv" "strings" + "errors" "github.com/PuerkitoBio/urlesc" "golang.org/x/net/idna" @@ -147,10 +148,43 @@ func MustNormalizeURLString(u string, f return result } +func myURLParse(u string) (*url.URL, error) { + // first try to parse the url as normal and return it if successful + parsed, err := url.Parse(u) + if err == nil { + return parsed, nil + } + + // path possibly contains control characters, which url.Parse + // doesn't allow. Try to parse url without path and then add it. + + // find third / and assume that's where path starts. + parts := strings.SplitN(u, "/", 4) + + var noPathURL string + if len(parts) != 4 { + return nil, errors.New("Failed to find start of path in url") + } + noPathURL = strings.Join(parts[:3], "/") + + parsed, err = url.Parse(noPathURL) + if err != nil { + return nil, err + } + + pathquery := strings.SplitN(parts[3], "#", 2) + parsed.Path = pathquery[0] + if len(pathquery) > 1 { + parsed.Fragment = pathquery[1] + } + + return parsed, nil +} + // NormalizeURLString returns the normalized string, or an error if it can't be parsed into an URL object. // It takes an URL string as input, as well as the normalization flags. func NormalizeURLString(u string, f NormalizationFlags) (string, error) { - parsed, err := url.Parse(u) + parsed, err := myURLParse(u) if err != nil { return "", err }