[EMAIL PROTECTED] wrote:
>
> ... But for this kind job, parse is great. You just have to embed
> a little code. This is what I did:
>
I've continued to ponder this one today, and I think I've been
subconsciously resisting embedded code as much as possible. One of
the nice things about REs is that they AREN'T code, just description.
And, along the lines of the discussion that Bob inspired, they're
just data, which you can read from any old place and apply.
Ah, well. Enuff mumbling. However, this exercise did help me get
my head around the issue of why translating an RE (as a string) into
a REBOL parse rule is non-trivial in the general case.
>
> ========== Code:
>
Discounting the test-harness, you got it down to...
>
> nonbrack: complement to bitset! "([])"
> set [title name address email] copy/deep ["" "" "" ""]
> parse item [
> copy title to ":" skip (trim title)
> copy name some nonbrack (trim name)
> any [
> "[" copy address to "]" skip (trim address) |
> "(" copy email to ")" skip (trim email)
> ]
> ]
>
which is a big improvement, both in size and in clarity! Thanks!
>
> If spaces are not important for delimiting fields, it's easier to use
> PARSE than PARSE/ALL. If you want to trim the spaces, it's easier to
> do that with TRIM than with PARSE itself.
>
Yeah, it's just my old habit of trying to put everything into the
pattern... for reasons mentioned below.
>
> This code allows address and email to be in either order, though
> that's probably not an advantage in this case.
>
Not an advantage in this case, but also not at all harmful. I'm
NEVER above accepting a generalization that simplifies without
breaking something.
>
> It's not always necessary for the parse to "succeed" (return true)
> in order for it to do what you want, so the BLANKS END you used
> at the end of your rule probably wasn't necessary.
>
True. Old habits, again. I'm used to writing code that's driven by
whether a pattern matched or not. In the kinds of text hacking I've
done failure to match might indicate:
1) bogus input (i.e., the item was really mis-typed),
2) bogus setup (i.e., the structure parsing got confused and passed
on something that wasn't really supposed to be an item), or
3) there's more variation than I knew about in the input data.
That last one is particularly likely in some kinds of text munching.
Often I'm feeling my way along because nobody even knows HOW to give
me a spec (but they can quickly spot when they don't like the result).
For example, I've done a fair bit with address databases (prior to my
present job). Try taking two human-entered strings and determining
whether they're likely to be variants on the same address. Or, as
another example, take a file of addresses (human-entered text) and
"alphabetize" them -- order them alphabetically by street name and
numerically by street number within the same street.
35 Hickory Lane 27 Hickory Ln
275 South Oak Street 35 Hickory Ln
458 N Oak St 159 Hickory Ln
159 Hickory Lane ==> 48 N Oak St
48 North Oak St 458 N Oak St
222 Orchid Avenue 35 S Oak St
27 Hickory Ln 275 S Oak St
35 S Oak Street 222 Orchid Ave
>
> I put a COPY/DEEP in there as insurance to reduce the chances of
> code revisions introducing a you-know-what kind of bug.
>
Ooooh. The "s" word! ;-)
Thanks, again! As always, I learn a lot from swapping ideas with the
folks on this list.
-jn-