Hello, I am new to the list. I have searched through the archive of previous
posts hoping to find the information I need, and I have read the various
documents at www.rebol.com, including the new users guide.
I still can't do what I need to do!
If this material is covered somewhere, please let me know where and I'll go
look it up. Otherwise, I would appreciate some guidance.
~~~ ~~~ ~~~
Problem: I need to parse an HTML page and pull values out of certain fields for
later analysis. Can this be done with 'parse and if so, how?
Sample Data:
<TABLE>
<TR><TD>ALPHA</TD><TD>ONE</TD></TR>
<TR><TD>BETA</TD><TD>TWO</TD></TR>
<TR><TD COLSPAN=2>DUMMY LINE ONE</TD></TR>
<TR><TD>GAMMA</TD><TD>THREE</TD></TR>
<TR><TD>DELTA</TD><TD>FOUR</TD></TR>
<TR><TD COLSPAN=2>DUMMY LINE TWO</TD></TR>
<TR><TD>EPSILON</TD><TD>FIVE</TD></TR>
</TABLE>
Desired output:
ALPHA = ONE
BETA = TWO
GAMMA = THREE
DELTA = FOUR
EPSILON = FIVE
How I would do it in PERL:
<PERL>
## I am assuming the data is in a file specified on the command line
## and the output is being sent to STDOUT
$pattern = '<tr><td>([\w\s]*)<\/td><td>([\w\s]*)<\/td><\/tr>';
while(<>)
{
if( $_ =~ m/$pattern/gi ) { print "$1 = $2\n"; }
}
</PERL>
A Few Notes:
1) I only want to pull the cell contents out if there are two cells per row,
the other rows either contain needless data or section headers.
2) I actually need the values, they need to be reformated and compared, so
'just' printing them would not be enough in the script.
3) I know how to split the file into lines in REBOL if that would help, and I
know how to MATCH the data in REBOL ... but I do _NOT_ know how to pull the
values out of that data.
4) I have tried combinations of [thru <tr> <td> copy txt1 to </td> <td>] (which
works fine, for pulling out ONE value) but I cannot write a syntactically
correct parse-grammer that would pull out both values.
5) Also, could someone please explain the weird finding I outline below.
>> sample-text: "alpha#beta"
== "alpha#beta"
>> probe parse sample-text [copy txt1 to "#" "#" copy txt2 to end (print [txt1
txt2])]
alpha beta
true
== true
>> sample-text: "alpha<td>beta"
== "alpha<td>beta"
>> probe parse sample-text [copy txt1 to <td> <td> copy txt2 to end (print
[txt1 txt2])]
alpha beta
true
== true
>> sample-text: "alpha</td><td>beta"
== "alpha</td><td>beta"
>> probe parse sample-text [copy txt1 to </td> <td> </td> <td> copy txt2 to end
(print [txt1 txt2])]
false
== false
; So, why is my parse grammer correct for a single seperator (whether text or
tag) but incorrect for a double seperator?
Thank you, in advance for your assistance in this matter.
=====
Steve ~runester~ Jarjoura
"According to my calculations, that problem doesn't exist."
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com