Re: regex support RFC

2006-04-03 Thread Mauro Tortonesi
Hrvoje Niksic wrote: Tony Lewis [EMAIL PROTECTED] writes: I don't think ,r complicates the command that much. Internally, the only additional work for supporting both globs and regular expressions is a function that converts a glob into a regexp when ,r is not requested. That's a

Re: regex support RFC

2006-04-03 Thread Mauro Tortonesi
Curtis Hatter wrote: On Friday 31 March 2006 06:52, Mauro Tortonesi: while i like the idea of supporting modifiers like quick (short circuit) and maybe i (case insensitive comparison), i think that (?i:) and (?-i:) constructs would be overkill and rather hard to implement. I figured that the

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Scott Scriven wrote: * Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. not really. it would not match www.yoyodyne.com. It would

Re: regex support RFC

2006-03-31 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes: Scott Scriven wrote: * Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. not really. it

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: Scott Scriven wrote: * Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Oliver Schulze L. wrote: Hrvoje Niksic wrote: The regexp API's found on today's Unix systems might be usable, but unfortunately those are not available on Windows. My personal idea on this is to: enable regex in Unix and disable it on Windows. We all use Unix/Linux and regex is really

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Curtis Hatter wrote: On Thursday 30 March 2006 13:42, Tony Lewis wrote: Perhaps --filter=path,i:/path/to/krs would work. That would look to be the most elegant method. I do hope that the (?i:) and (?-i:) constructs are supported since I may not want the entire path/file to be case

Re: regex support RFC

2006-03-31 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. not really. it would not match www.yoyodyne.com. Why not? i may be wrong, but if -

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Hrvoje Niksic wrote: Herold Heiko [EMAIL PROTECTED] writes: Get the best of both, use a syntax permitting a first match-exits ACL, single ACE permits several statements ANDed together. Cooking up a simple syntax for users without much regexp experience won't be easy. I assume ACL stands for

Re: regex support RFC

2006-03-31 Thread Wincent Colaiuta
El 31/03/2006, a las 14:37, Hrvoje Niksic escribió: * matches the previous character repeated 0 or more times. This is in contrast to wildcards, where * alone matches any character 0 or more times. (This is part of why regexps are often confusing to people used to the much simpler wildcards.)

Re: regex support RFC

2006-03-31 Thread Hrvoje Niksic
Wincent Colaiuta [EMAIL PROTECTED] writes: Are you sure that www-* matches www? Yes. As far as I know www-* matches one w, another w, a third w, a hyphen, then 0 or more hyphens. That would be www--* or www-+.

Re: regex support RFC

2006-03-31 Thread Mauro Tortonesi
Hrvoje Niksic wrote: Wincent Colaiuta [EMAIL PROTECTED] writes: Are you sure that www-* matches www? Yes. hrvoje is right. try this perl script: #!/usr/bin/perl -w use strict; my @strings = (www-.yoyodyne.com, www.yoyodyne.com); foreach my $str (@strings) { $str =~

Re: regex support RFC

2006-03-31 Thread Oliver Schulze L.
Mauro Tortonesi wrote: for consistency and to avoid maintenance problems, i would like wget to have the same behavior on windows and unix. please, notice that if we implemented regex support only on unix, windows binaries of wget built with cygwin would have regex support but native binaries

RE: regex support RFC

2006-03-31 Thread Tony Lewis
Mauro Tortonesi wrote: no. i was talking about regexps. they are more expressive and powerful than simple globs. i don't see what's the point in supporting both. The problem is that users who are expecting globs will try things like --filter=-file:*.pdf rather than --filter:-file:.*\.pdf. In

Re: regex support RFC

2006-03-31 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes: Mauro Tortonesi wrote: no. i was talking about regexps. they are more expressive and powerful than simple globs. i don't see what's the point in supporting both. The problem is that users who are expecting globs will try things like

RE: regex support RFC

2006-03-31 Thread Tony Lewis
Hrvoje Niksic wrote: But that misses the point, which is that we *want* to make the more expressive language, already used elsewhere on Unix, the default. I didn't miss the point at all. I'm trying to make a completely different one, which is that regular expressions will confuse most users

Re: regex support RFC

2006-03-31 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes: I didn't miss the point at all. I'm trying to make a completely different one, which is that regular expressions will confuse most users (even if you tell them that the argument to --filter is a regular expression). Well, most users will probably not use

RE: regex support RFC

2006-03-31 Thread Tony Lewis
Hrvoje Niksic wrote: I don't see a clear line that connects --filter to glob patterns as used by the shell. I want to list all PDFs in the shell, ls -l *.pdf I want a filter to keep all PDFs, --filter=+file:*.pdf Note that *.pdf is not a valid regular expression even though it's what most

Re: regex support RFC

2006-03-31 Thread Curtis Hatter
On Friday 31 March 2006 06:52, Mauro Tortonesi: while i like the idea of supporting modifiers like quick (short circuit) and maybe i (case insensitive comparison), i think that (?i:) and (?-i:) constructs would be overkill and rather hard to implement. I figured that the (?i:) and (?-i:)

Re: regex support RFC

2006-03-31 Thread Scott Scriven
* Mauro Tortonesi [EMAIL PROTECTED] wrote: I'm hoping for ... a raw type in addition to file, domain, etc. do you mean you would like to have a regex class working on the content of downloaded files as well? Not exactly. (details below) i don't like your raw proposal as it is

RE: regex support RFC

2006-03-31 Thread Sandhu, Ranjit
31, 2006 10:03 AM To: wget@sunsite.dk Subject: RE: regex support RFC Mauro Tortonesi wrote: no. i was talking about regexps. they are more expressive and powerful than simple globs. i don't see what's the point in supporting both. The problem is that users who are expecting globs will try

RE: regex support RFC

2006-03-30 Thread Herold Heiko
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] I don't think such a thing is necessary in practice, though; remember that even if you don't escape the dot, it still matches the (intended) dot, along with other characters. So for quickdirty usage not escaping dots will just work, and those who

Re: regex support RFC

2006-03-30 Thread Hrvoje Niksic
Herold Heiko [EMAIL PROTECTED] writes: From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] I don't think such a thing is necessary in practice, though; remember that even if you don't escape the dot, it still matches the (intended) dot, along with other characters. So for quickdirty usage not

Re: regex support RFC

2006-03-30 Thread Jim Wright
On Thu, 30 Mar 2006, Mauro Tortonesi wrote: I do like the [file|path|domain]: approach. very nice and flexible. (and would be a huge help to one specific need I have!) I suggest also including an any option as a shortcut for putting the same pattern in all three options. do you

Re: regex support RFC

2006-03-30 Thread Curtis Hatter
On Wednesday 29 March 2006 12:05, you wrote: we also have to reach consensus on the filtering algorithm. for instance, should we simply require that a url passes all the filtering rules to allow its download (just like the current -A/R behaviour), or should we instead adopt a short circuit

RE: regex support RFC

2006-03-30 Thread Tony Lewis
How many keywords do we need to provide maximum flexibility on the components of the URI? (I'm thinking we need five.) Consider http://www.example.com/path/to/script.cgi?foo=bar --filter=uri:regex could match against any part of the URI --filter=domain:regex could match against www.example.com

Re: regex support RFC

2006-03-30 Thread Scott Scriven
* Mauro Tortonesi [EMAIL PROTECTED] wrote: wget -r --filter=-domain:www-*.yoyodyne.com This appears to match www.yoyodyne.com, www--.yoyodyne.com, www---.yoyodyne.com, and so on, if interpreted as a regex. It would most likely also match www---zyoyodyneXcom. Perhaps you want glob patterns

RE: regex support RFC

2006-03-30 Thread Tony Lewis
Curtis Hatter wrote: Also any way to add modifiers to the regexs? Perhaps --filter=path,i:/path/to/krs would work. Tony

Re: regex support RFC

2006-03-30 Thread Curtis Hatter
On Thursday 30 March 2006 13:42, Tony Lewis wrote: Perhaps --filter=path,i:/path/to/krs would work. That would look to be the most elegant method. I do hope that the (?i:) and (?-i:) constructs are supported since I may not want the entire path/file to be case (in)?sensitive =), but that will

Re: regex support RFC

2006-03-30 Thread Oliver Schulze L.
Hrvoje Niksic wrote: The regexp API's found on today's Unix systems might be usable, but unfortunately those are not available on Windows. My personal idea on this is to: enable regex in Unix and disable it on Windows. We all use Unix/Linux and regex is really usefull. I think not having

Re: regex support RFC

2006-03-30 Thread Scott Scriven
* [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: wget -e robots=off -r -N -k -E -p -H http://www.gnu.org/software/wget/ soon leads to non wget related links being downloaded, eg. http://www.gnu.org/graphics/agnuhead.html In that particular case, I think --no-parent would solve the problem.

Re: regex support RFC

2006-03-29 Thread Jim Wright
what definition of regexp would you be following? or would this be making up something new? I'm not quite understanding the comment about the comma and needing escaping for literal commas. this is true for any character in the regexp language, so why the special concern for comma? I do like

Re: regex support RFC

2006-03-29 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes: for instance, the syntax for --filter presented above is basically the following: --filter=[+|-][file|path|domain]:REGEXP I think there should also be url for filtering on the entire URL. People have been asking for that kind of thing a lot over the

Re: regex support RFC

2006-03-29 Thread Hrvoje Niksic
Jim Wright [EMAIL PROTECTED] writes: what definition of regexp would you be following? or would this be making up something new? It wouldn't be new, Mauro is definitely referring to regexps as normally understood. The regexp API's found on today's Unix systems might be usable, but