"Trent W. Buck via luv-main" writes:

> Tim Connors wrote:
>> msy?  That looks nifty
>
> http://www.cyber.com.au/~twb/.bin/msy
> http://www.cyber.com.au/~twb/.bin/foldr
>
> Not using XSLT I'm afraid.

Ta-daaaaaaa!

I was angry enough at my script that I rewrote it.
Now it uses XSLT instead of a browser.

#!/bin/bash
set -eEu -o pipefail
shopt -s failglob
trap 'echo >&2 "$0:${LINENO}: unknown error"' ERR

# MSY is a local computer parts retailer.
# Their parts list is HTML generated by Microsoft Word,
# and is notoriously slow to render in a browser.
#
# This script turns it into a simple line-delimited list,
# to make grepping much easier.
#
# EXAMPLE:
#
#   msy >cache
#   <cache foldr grep -i -- 2TB 3.5 7200 SATA
#   <cache foldr grep -i -- Modular PSU [5-9]..W
#
# See also: http://www.accc.gov.au/search/accc-funnelback/MSY
# See also: http://www.staticice.com.au/

curl -sSf http://msy.com.au/Parts/PARTS_W.HTM |

# tidy seems to struggle to convert the encoding,
# so convert it in advance & then tell tidy to both read & write UTF-8.
# The --to ascii//translit makes grepping for inches (") easier
iconv -c --from gb2312 --to ascii//translit |
sed 's/&nbsp;/ /' |             # Downgrade NBSP to CFWS.
sed -r 's|<o:p> ?</o:p>||g' |   # Appease tidy.
{   tidy -asxml --numeric-entities yes -utf8 -w 0 -q --show-warnings no ||
    (($? == 1))                 # tidy warnings aren't fatal errors.
} |

sed '1,/^<html/c<html>' |       # Appease xmlstarlet (namespaces are annoying).
xmlstarlet sel -T -t -m '//table[2]/tr' -v 'td[2]/p' -o ' ' -v 'td[1]/p' -nl |
sort -rn |                      # Sort by descending price.
numfmt --invalid=ignore --padding=5 --grouping
#numfmt --invalid=ignore --padding=4 --to=si --round=up
_______________________________________________
luv-main mailing list
[email protected]
http://lists.luv.asn.au/listinfo/luv-main

Reply via email to