Erez Schatz <[email protected]> writes:
> Care to share your script here? I'd love to see what you came up with.
OK. I've stripped-out the application-specific data from the script;
here it is in its redacted form:
,{
,y/<form/ g/<head/ s/(.|\n)*>(Expected Page
Title)<(.|\n)*updated.*>([0-9]+\/[0-9]+\/[0-9]+)<(.|\n)*/:Title: \2\n:Date:
\4\n/
s/<form//
,y/<form/ v/<head/ y/<h2/ v|/h2| d
,y/<form/ v/<head/ s/<h2/\n/
,y/<form/ v/<head/ y/<h2/ {
g|/h2| y/<tr/ {
v/<td/ s/([
>]| )*([^<]*)<\/h2>(.|\n)*/\n\n\2\n==========================\n/
g/<td/ g/Entry#/ y/<td/ {
v|/td| c/\n/
}
g/<td/ v/Entry#/ y/<td/ v|/td| d
g/<td/ y/<td/ {
g|/td| v/colspan="3"/ g/>(Header Foo)/ y/(Header Foo)/ {
v|/td| c/:/
g|/td| s/ : /:/
g|/td| x/(\n|<[^<>]+>\??| )+/ c/ /
g|/td| a/\n/
}
g|/td| v/colspan="3"/ g/>(Header Bar)/ y/(Header Bar)/ {
v|/td| c/:/
g|/td| i/:/
g|/td| x/( |\n|<[^<>]+>\??| )+/ c/ /
g|/td| a/\n/
}
g|/td| v/colspan="3"/ g/>(Entry#|Field Foo|Filed Bar|Field Baz|Field
Quux|Field Snarf|Field Barf)/ y/(Entry#|Field Foo|Filed Bar|Field Baz|Field
Quux|Field Snarf|Field Barf)/ {
v|/td| c/:/
g|/td| s/[:.?]*/:/
g|/td| x/( |\n|<[^<>]+>| )+/ c/ /
g|/td| a/\n/
}
g|/td| v/colspan="3"/ g/inch/ {
s/[^>]*>( |\n|<[^<>]+>| )*(([a-zA-Z0-9\-]+ )+inch
?([a-zA-Z0-9\-]+ )*[a-zA-Z0-9\-]+)( |\n|<[^<>]+>| )*/:Inches: \2\n/
}
g|/td| v/colspan="3"/ g/Stock/ {
i/:Density:/
x/( |\n|<?[^<>]+>| )+/ c/ /
a/\n/
}
g|/td| g/colspan="3"/ {
s/[^>]*>/:Details:/
y/[^>]*colspan="3"[^>]*>/ x/( |\n|<[^<>]+>\??| )+/ c/ /
a/\n:Notes: \n\n/
}
g|/td| g|checkbox| d
}
}
}
}
,x/<h2|<tr|<td/d
That last line is a bit of a hack. I needed it because there didn't
appear to be any way to delete the "<h2" delimiters from within the
,y/<h2/. But it works because those HTML tags do not appear in the
final reStructuredText.
A lot of the complexity of the script comes from the need to keep the
changes in sequence. I really hope that the implementation of the sam
language in Acme doesn't impose the same requirement to keep changes in
order; it's a Real Pain(TM).
One of the things that still perplexes me is the apparent necessity of
the s command. The sam paper claims that the s command isn't necessary.
But I couldn't find any way to do the edits without resorting to it. If
you could figure out how to replace the s commands with a combination of
other sam commands, I'd be quite impressed indeed!
--
+---------------------------------------------------------------+
|Smiley <[email protected]> PGP key ID: BC549F8B |
|Fingerprint: 9329 DB4A 30F5 6EDA D2BA 3489 DAB7 555A BC54 9F8B|
+---------------------------------------------------------------+