-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tony Lewis wrote:
> Micah Cowan wrote:
> 
>> On expanding current URI acc/rej matches to allow matching against query
>> strings, I've been considering how we might enable/disable this
>> functionality, with an eye toward backwards compatibility.
> 
> What about something like --match-type=TYPE (with accepted values of all,
> hash, path, search)?
> 
> For the URL http://www.domain.com/path/to/name.html?a=true#content
> 
> all would match against the entire string
> hash would match against "content"
> path would match against "path/to/name.html"
> search would match against "a=true"
> 
> For backward compatibility the default should be --match-type=path.
> 
> I thought about having "host" as an option, but that duplicates another
> option.

As does path (up to the final /).

Would "hash" really be useful, ever? It's never part of the request to
the server, so it's really more "context" to the URL than a real part of
the URL, as far as requests go. Perhaps that sort of thing could best
wait for when we allow custom URL-parsers/filters.

Also, I don't like the name "search" overly much, as that's a very
limited description of the much more general use of query strings.

But differentiating between three or more different match types tilts me
much more strongly toward some sort of shorthand, like the explicit need
for \?; with three types, perhaps we'd just use some special prefix for
patterns to indicate which sort of match we want (":q:" query strings,
":a:" for all, or whatever), to save on prefix each different type of
match with --match-type (or just using "all" for everything).

OTOH, regex support is easy enough to add to Wget, now that we're using
gnulib; we could just leave wildcards the way they are, and introduce
regexes that match everything. Then query strings are '\?.*foo=bar' (or,
for the really pedantic, '\?([^?]*&)?foo=bar(&[^?]*)?$')

That last one, though, highlights how cumbersome it is to do proper
matching against typical HTML form-generated query strings (it's not
really even possible with wildcards). Perhaps a more appropriate
pattern-matcher specifically for query strings would be a good idea.
It's probably enough to do something like --query-='action=Edit', where
there's an implied '\?([^?]*&)?' before, and '(&[^?]*)?$' after.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa
vQ1lZnncrdHd8SSbXevK02Y=
=YC2A
-----END PGP SIGNATURE-----

Reply via email to