Greetings from the Gothenburg BSP.

This is the output I get when reproducing this issue:

---8<----->8-----

--- FAIL: TestEncodeNecessaryEscapesAll (0.00s)
    purell_test.go:761: Got error parse http://host/
        

� net/url: invalid control character in URL
FAIL
exit status 1
FAIL    _/tmp/golang-github-puerkitobio-purell-1.1.0    0.003s

---8<----->8-----


net/url Parse now explicitly check that you don't pass in
"invalid" (non-encoded) urls and rejects them:
https://sources.debian.org/src/golang-1.11/1.11.6-1/src/net/url/url.go/?hl=498#L498

The NormalizeUrlString helper function starts out by passing the url
string to url.Parse:
https://sources.debian.org/src/golang-github-puerkitobio-purell/1.1.0-1/purell.go/#L153

The TestDecodeUnnecessaryEscapesAll function creates an url with control
characters and passes it to NormalizeURLString to try to get it
normalized/escaped.

I'd say the design of the NormalizeUrlString function is broken and thus
it's not obvious to me how to fix it. I'd say that if you have a
non-normalized string you want encoded, you should pass it in as
something that doesn't need to be parsed. ie. use NormalizeUrl helper
function, which takes an Url type.

Removing the NormalizeUrlString function however would be an API/ABI
break as it's publicly exported (and used by Hugo and Kubernetes)...

Maybe open-coding some homebrew url parsing to fall back on could be
done to keep the function around..... Sounds bad to think you know
better than net/url how to parse an url though.

I'm attaching a patch which "fixes" this issue, but really it's most likely
a stupid and wrong things to do to solve this. It's just designed to
make the testsuite pass. As mentioned, I think the NormalizeURLString
function is incorrectly designed and there's no way to fix it (so should
be deprecated in favour of NormalizeURL).
Thus intentionally *not* tagging patch, as this is rather a "proof of
concept".

Regards,
Andreas Henriksson
diff -uriNp golang-github-puerkitobio-purell-1.1.0/purell.go golang-github-puerkitobio-purell-1.1.0-fixed/purell.go
--- golang-github-puerkitobio-purell-1.1.0/purell.go	2016-11-15 03:49:42.000000000 +0100
+++ golang-github-puerkitobio-purell-1.1.0-fixed/purell.go	2019-04-07 16:00:06.039745498 +0200
@@ -12,6 +12,7 @@ import (
 	"sort"
 	"strconv"
 	"strings"
+	"errors"
 
 	"github.com/PuerkitoBio/urlesc"
 	"golang.org/x/net/idna"
@@ -147,10 +148,43 @@ func MustNormalizeURLString(u string, f
 	return result
 }
 
+func myURLParse(u string) (*url.URL, error) {
+	// first try to parse the url as normal and return it if successful
+	parsed, err := url.Parse(u)
+	if err == nil {
+		return parsed, nil
+	}
+
+	// path possibly contains control characters, which url.Parse
+	// doesn't allow. Try to parse url without path and then add it.
+
+	// find third / and assume that's where path starts.
+	parts := strings.SplitN(u, "/", 4)
+
+	var noPathURL string
+	if len(parts) != 4 {
+		return nil, errors.New("Failed to find start of path in url")
+	}
+	noPathURL = strings.Join(parts[:3], "/")
+
+	parsed, err = url.Parse(noPathURL)
+	if err != nil {
+		return nil, err
+	}
+
+	pathquery := strings.SplitN(parts[3], "#", 2)
+	parsed.Path = pathquery[0]
+	if len(pathquery) > 1 {
+		parsed.Fragment = pathquery[1]
+	}
+
+	return parsed, nil
+}
+
 // NormalizeURLString returns the normalized string, or an error if it can't be parsed into an URL object.
 // It takes an URL string as input, as well as the normalization flags.
 func NormalizeURLString(u string, f NormalizationFlags) (string, error) {
-	parsed, err := url.Parse(u)
+	parsed, err := myURLParse(u)
 	if err != nil {
 		return "", err
 	}

Reply via email to