[EMAIL PROTECTED] wrote:
> 
> ... But for this kind job, parse is great. You just have to embed
> a little code. This is what I did:
> 

I've continued to ponder this one today, and I think I've been
subconsciously resisting embedded code as much as possible.  One of
the nice things about REs is that they AREN'T code, just description.
And, along the lines of the discussion that Bob inspired, they're
just data, which you can read from any old place and apply.

Ah, well.  Enuff mumbling.  However, this exercise did help me get
my head around the issue of why translating an RE (as a string) into
a REBOL parse rule is non-trivial in the general case.

>
> ========== Code:
> 

Discounting the test-harness, you got it down to...

> 
> nonbrack: complement to bitset! "([])"
> set [title name address email] copy/deep ["" "" "" ""]
> parse item [
>   copy title to ":" skip  (trim title)
>   copy name some nonbrack (trim name)
>   any [
>          "[" copy address to "]" skip (trim address) |
>          "(" copy email to ")" skip (trim email)
>       ]
> ]
>

which is a big improvement, both in size and in clarity!  Thanks!

> 
> If spaces are not important for delimiting fields, it's easier to use
> PARSE than PARSE/ALL. If you want to trim the spaces, it's easier to
> do that with TRIM than with PARSE itself.
>

Yeah, it's just my old habit of trying to put everything into the
pattern...  for reasons mentioned below.

>
> This code allows address and email to be in either order, though
> that's probably not an advantage in this case.
>

Not an advantage in this case, but also not at all harmful.  I'm
NEVER above accepting a generalization that simplifies without
breaking something.

>
> It's not always necessary for the parse to "succeed" (return true)
> in order for it to do what you want, so the BLANKS END you used
> at the end of your rule probably wasn't necessary.
>

True.  Old habits, again.  I'm used to writing code that's driven by
whether a pattern matched or not.  In the kinds of text hacking I've
done failure to match might indicate:

1) bogus input (i.e., the item was really mis-typed),
2) bogus setup (i.e., the structure parsing got confused and passed
   on something that wasn't really supposed to be an item), or
3) there's more variation than I knew about in the input data.

That last one is particularly likely in some kinds of text munching.
Often I'm feeling my way along because nobody even knows HOW to give
me a spec (but they can quickly spot when they don't like the result).

For example, I've done a fair bit with address databases (prior to my
present job).  Try taking two human-entered strings and determining
whether they're likely to be variants on the same address.  Or, as
another example, take a file of addresses (human-entered text) and
"alphabetize" them -- order them alphabetically by street name and
numerically by street number within the same street.

    35 Hickory Lane                27 Hickory Ln
    275 South Oak Street           35 Hickory Ln
    458 N Oak St                   159 Hickory Ln
    159  Hickory  Lane      ==>    48 N Oak St
    48 North Oak St                458 N Oak St
    222 Orchid Avenue              35 S Oak St
    27 Hickory Ln                  275 S Oak St
    35 S Oak Street                222 Orchid Ave

>
> I put a COPY/DEEP in there as insurance to reduce the chances of
> code revisions introducing a you-know-what kind of bug.
> 

Ooooh.  The "s" word!  ;-)

Thanks, again!  As always, I learn a lot from swapping ideas with the
folks on this list.

-jn-

Reply via email to