"Trent W. Buck via luv-main" writes:
> Tim Connors wrote:
>> msy? That looks nifty
>
> http://www.cyber.com.au/~twb/.bin/msy
> http://www.cyber.com.au/~twb/.bin/foldr
>
> Not using XSLT I'm afraid.
Ta-daaaaaaa!
I was angry enough at my script that I rewrote it.
Now it uses XSLT instead of a browser.
#!/bin/bash
set -eEu -o pipefail
shopt -s failglob
trap 'echo >&2 "$0:${LINENO}: unknown error"' ERR
# MSY is a local computer parts retailer.
# Their parts list is HTML generated by Microsoft Word,
# and is notoriously slow to render in a browser.
#
# This script turns it into a simple line-delimited list,
# to make grepping much easier.
#
# EXAMPLE:
#
# msy >cache
# <cache foldr grep -i -- 2TB 3.5 7200 SATA
# <cache foldr grep -i -- Modular PSU [5-9]..W
#
# See also: http://www.accc.gov.au/search/accc-funnelback/MSY
# See also: http://www.staticice.com.au/
curl -sSf http://msy.com.au/Parts/PARTS_W.HTM |
# tidy seems to struggle to convert the encoding,
# so convert it in advance & then tell tidy to both read & write UTF-8.
# The --to ascii//translit makes grepping for inches (") easier
iconv -c --from gb2312 --to ascii//translit |
sed 's/ / /' | # Downgrade NBSP to CFWS.
sed -r 's|<o:p> ?</o:p>||g' | # Appease tidy.
{ tidy -asxml --numeric-entities yes -utf8 -w 0 -q --show-warnings no ||
(($? == 1)) # tidy warnings aren't fatal errors.
} |
sed '1,/^<html/c<html>' | # Appease xmlstarlet (namespaces are annoying).
xmlstarlet sel -T -t -m '//table[2]/tr' -v 'td[2]/p' -o ' ' -v 'td[1]/p' -nl |
sort -rn | # Sort by descending price.
numfmt --invalid=ignore --padding=5 --grouping
#numfmt --invalid=ignore --padding=4 --to=si --round=up
_______________________________________________
luv-main mailing list
[email protected]
http://lists.luv.asn.au/listinfo/luv-main