Package: moreutils
Version: 0.18
Severity: wishlist
Is there a text 'grep' for web pages? If not, that'd
be good.
I'm thinking something that acts like this ad hoc function, only better:
# usage: URLgrep pattern URL
# (ad hoc grep switches return first instance of 'pattern'
# in URL and next line, with numbered lines.)
URLgrep() { wget -o /dev/null --output-document=- "$2" | html2text -ascii
-nobs | grep -in -A 1 -m 1 "$1" ; }
Notes: the 'grep' switches there are a frill. It'd be better to make a
script that could parse any 'grep' switches. ' html2text' has its own
package. Other features that would be nice: it should ascertain that
the URL _is_ html data, and do something reasonable if it's not. Since
bandwidth can be expensive, it would be good if when it used the '-m'
switch, that it stopped reading data. (Or does 'grep' already do that?)
Rationalle: sometimes there's no URL (or no obvious URL) for the text
you want to quote or point to. This would help, and such text quotes
wouldn't be changeable, unlike their source URLs; using it would be a
boon for future message archives where so many URL references are now
dead links.
Names: 'URLgrep' is a lame name. 'webgrep'? 'wgrep'? ...
Hope this helps...
-- System Information:
Debian Release: 4.0
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/dash
Kernel: Linux 2.6.16-2-686
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C)
Versions of packages moreutils depends on:
ii libc6 2.3.6.ds1-7 GNU C Library: Shared libraries
ii perl 5.8.8-6.1 Larry Wall's Practical Extraction
moreutils recommends no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]