[REBOL] Pulling values from parsed HTML ... more REGEX trouble Re:

icimjs Wed, 19 Jan 2000 19:36:08 -0800
Hi Steve,

The simple question first:
>>> sample-text: "alpha</td><td>beta"
>== "alpha</td><td>beta"
>>> probe parse sample-text [copy txt1 to </td> <td> </td> <td> copy txt2
to end
>(print [txt1 txt2])]
>false
>== false
>
>
>; So, why is my parse grammer correct for a single seperator (whether text or
>tag) but incorrect for a double seperator?

1. text to be matched must be in quotation marks. Use "<td>" instead of
<td>. REBOL recognizes the tag datatype, <td>, parse apparently doesn't.

2. the first part of your rule is 
to </td>
This navigational instruction tells REBOL to parse the string up to the
first occurence of a closing </td>.

Furthermore, you have an action instruction
copy txt1

This instruction tells REBOL to copy everything up to that point, which was
to </td>, into txt1. 

Your next instruction is to match the next character in your sample-text to
the tag <td>, this is where parse fails with good reason.

your sample text is
"alpha</td><td>beta"

since to </td> navigated up to but not past the closing tag </td>, parse is
still positioned just in front of the tag </td>. Your rule prescribes that
parse match the next character sequence it encounters, which will be </td>m
to the opening tag <td>. 

Since </td> - the character parse is now pointing it - is different from
<td>  - the opening tag you provide as the criterium for your match - this
match will fail and parse will return false at this point.


Now for number 2:
>Problem: I need to parse an HTML page and pull values out of certain
fields for
>later analysis. Can this be done with 'parse and if so, how?
>
>Sample Data:
><TABLE>
><TR><TD>ALPHA</TD><TD>ONE</TD></TR>
><TR><TD>BETA</TD><TD>TWO</TD></TR>
><TR><TD COLSPAN=2>DUMMY LINE ONE</TD></TR>
><TR><TD>GAMMA</TD><TD>THREE</TD></TR>
><TR><TD>DELTA</TD><TD>FOUR</TD></TR>
><TR><TD COLSPAN=2>DUMMY LINE TWO</TD></TR>
><TR><TD>EPSILON</TD><TD>FIVE</TD></TR>
></TABLE>
>
>Desired output:
>ALPHA = ONE
>BETA = TWO
>GAMMA = THREE
>DELTA = FOUR
>EPSILON = FIVE
>

try:
REBOL []

row-open: false
found-first: false
found-second: false

value-first: {}
value-second: {}

result: copy {}

parse-string: {
<TABLE>
<TR><TD>ALPHA</TD><TD>ONE</TD></TR>
<TR><TD>BETA</TD><TD>TWO</TD></TR>
<TR><TD COLSPAN=2>DUMMY LINE ONE</TD></TR>
<TR><TD>GAMMA</TD><TD>THREE</TD></TR>
<TR><TD>DELTA</TD><TD>FOUR</TD></TR>
<TR><TD COLSPAN=2>DUMMY LINE TWO</TD></TR>
<TR><TD>EPSILON</TD><TD>FIVE</TD></TR>
</TABLE>
}



parse parse-string [ 
  some [
         ["<TR><TD>" copy text-1 to "</TD>" thru "<TD>" copy text-2 to
"</TD></TR>"
           (append result reduce [text-1 "=" text-2 newline])
         ] 
         |
         ["<TR>" thru "</TR>"]
         |
         skip
  ]
]

Hope this helps


;- Elan >> [: - )]
[REBOL] Pulling values from parsed HTML ... more REGEX trouble Re:

Reply via email to