[REBOL] yet MORE parsing Re:

bhandley Sun, 16 Jul 2000 19:15:15 -0700
> Hi.
>
> I have an outlining program which can export a simply formatted text
> file using tabs and + and - chars to define the outline.
>
>
> Here's an example:
>
>+ node 1 heading
>this is the content of node 1
> + sub-node a heading
>this is the content of sub-node a.  Note that
>the content is not tabbed, whereas the heading
>is, in order to describe its nesting level.
> - sub-node b (with no content)
> + sub-node c
>  + sub-node d heading
>this is the content for sub-node d
>
>
> Anyway, I hope you get the idea.  I want to parse this so that I can
> xml-, html- and rebol block-ize it.

Perhaps rebol block-ize it then take it to the others.

>
> In the past, I've used parse for some very simple tasks but I'm having
> trouble making the leap to understand what's going on in the more
> complex uses.
>
> In the hopes of appealing to your collective pity, here's an example of
> my truly dismal understanding of parse.  I wanted to start off with
> returning a block containing all of the node headings from a file
> coupled with their "level".  Here's (rather embarrassingly) what I came
> up with:
>
> rule: [
>     some [thru "^-" (level: level + 1)]
>     thru "+ " | thru "- "
>     copy heading to newline (
>         append o-block reduce[level heading]
>         level: 0
>     )
> ]
>
> This returns me only one heading (not the first, or the last) with a
> count of every tab it's encounted to that point.

You haven't enclosed the heading rule with a some ( or any) rule.

>I'm sure the OR line is not right, and the some probably isn't
> being used properly, but I just don't get it.

Like you some in the second line, you need to enclose your optional parts in
a block so that they form a rule of their own. e.g [ somethinga |
somethingb ]

>
> Parse questions are probably annoying, but has anybody got some pointers
> (I've read the users guide, yes) or suggestions on how I might proceed
> to handle this nested structure - or just understand parse better?

Annoying? Not at all. Parse is one of the best bits of Rebol - if not a
fundamental part.
Hopefully there will be more info. coming soon in the form of new
documentation.

When creating a grammar (not talking about Parse here) - you need to think
about what information the grammar is conveying. Your simple text format has
a grammar what does it convey. So in your example, it appears, tabs indicate
not just level but are used only for headings. "+" and "-" are used just for
headings too. So, except for the first level headings, are redundant as far
as processing headings go but may be useful to the reader. This gives clues
to the rule for processing headings.

When building rules for Parse you need to be mindful that all your parse
rules will be "complete" in the sense of describing fully the elements
(tokens) you wish to Parse out. Part of this is making sure your rules will
"consume" the input stream properly.

I've found interactively building small self-contained rules and testing
them to be confidence boosting. Each one testing different aspects of what I
need to achieve. This being a bottom-up approach.
In parallel I also use a top-down approach to describe general structures
(like interpreting an format specificiation).
At some time I meld the results of the bottom-up and top-down together to
get the first full draft. Then goes more tweaking, testing, expletives, etc
:)

In learning parse, I've found that multiple readings of the guide on parse
is useful - while actually coding the rules.

Following my bottom-approach, I'll do the heading first (I've roughly split
heading-rule into lines that reflect the elements I have in mind after
looking at the grammar of you example).

heading-rule: [
    (level: 0)
    any [ "^-" (level: add level 1)]
    ["+" | "-"]
    opt " "
    copy heading [to "^/" | to end]
]

Of course this took me a bit of testing to get it right. I started on the
string "^-+ sub-node a heading" and tested some others as well.

This rule says:
"They could be multiple tabs and if there are count 'em".
"Next - Either a plus or minus."
"Next - an optional space."
"Lastly - just copy whatever you find so long as it is not the end or a
newline."

Note that each match "consumes" the input.

On to the content. I decided use a match rule that would match as long as
the input reflected the characters of my element.

match-content-character: complement charset "^-^/"
content-line-rule: [
    copy content
    some
    match-content-character
]

This rule says:
"Copy the input as long as it matches the next match rule"
"Match one or more of"
"a character that is not a tab or a newline"

Now the top-down, well it turns out simply having done the bottom-up stuff.
simple-format-text: [
    some [
        newline | heading-rule | content-rule
    ]
]

You can guess this,
"Match one or more of"
"Either newline OR heading-rule OR content-rule"

Ok, time to meld it together. Notice that I'm not saving my tokens yet. So
what I'll do is put it all together and save my tokens to a block including
words to represent the meanings of those tokens. Which really means I'm
translating your grammar to mine. - hey a dialect! Shh..

REBOL [
    Author: "Brett Handley"
    Purpose: "A parse example for Chris."
]

heading-rule:

    (level: 0)
    any [ "^-" (level: add level 1)]
    ["+" | "-"]
    opt " "
    copy heading [to "^/" | to end]
    (append result-block reduce ['heading level heading])
]

match-content-character: complement charset "^-^/"
content-line-rule:

    copy content
    some
    match-content-character
    (append result-block reduce ['content content])
]

simple-format-text: [
    (result-block: copy [])
    some [
        newline | heading-rule | content-line-rule
    ]
]

; Usage parse/all read %somefile.txt simple-format-text
; print mold result-block

; --------------

If I was going to use this in a larger project, I would consider making the
rules and the parse command part of an object.

I hope the example approach was helpful. I thought it would be the best way
to describe my way of doing it.
Brett.
[REBOL] yet MORE parsing Re:

Reply via email to