Not sure I see the problem ...

I wasn't talking about changing sets on the fly at run time, only a *one-time* rewriting: Tag1·Tag2 tag strings being rewritten as //r reg.ex'es in order to preserve order. This would only be rewritten once, at compile time, not run time. And if we basically have a LINE tag already, the re-written tag strings could be matched against this LINE.

The tag string rewriting is so simple that it could be done by a grammar preprocessor rather than the actual compiler.

What would need to be adapted is only that this ordered-tag reg.ex'es would be recognized as such and tested against LINE rather than ordinary tags or set in the cohort. Isn't that just a simple IF branch during target/context matching? A bit like treating $$ unification sets differently?

Of course I'm talking algorithmically here, not claiming to predict the complexity of an implementation. But it looks feasible to me.

-- Eckhard


On 06/13/2017 08:46 PM, Tino Didriksen wrote:
Replied inline...

On 10 June 2017 at 07:40, Eckhard Bick <[email protected] <mailto:[email protected]>> wrote:

    1. We introduce a magic tag LINE, maintained by the compiler,
    constituted by the *whole* reading line (plus the word form at the
    start) as *one* tag, i.e. *without breaking on space*.


That part is easy and basically already done. The reading already stores an ordered list of tags - this is not the problem.


    2. If LIST or on-the-fly definitions use a tag parenthesis with
    space, e.g. (Tag1 Tag2), in a rule with the flag TAGORDER, this
    will be converted internally to /^(.* )?Tag1 Tag2( .*)?$/r.

        REMOVE TAGORDER (Tag3) IF (*1 (Tag1 Tag2)) ;


That's where it breaks. To change how some sets are compiled (or even worse, recompiled for non-inline sets) based on a rule flag is a major change and kludge. There is currently zero interaction between these two parts, and there shouldn't be. Sets don't know they are being parsed in the context of a rule, and it would be messy to add markers to not deduplicate these new kinds of sets.

    In addition to, or instead of, TAGORDER at the rule level, we
    could also introduce the concept of a "nonbreaking space
    character", e.g. · (mini-bullet) or double underscore, to allow
    flexible use of tag order down at the level of individual
    contexts: (Tag1·Tag2) or (Tag1__Tag2).


In the final solution, I will need to introduce regex-like * . .+ .* (or whatever) as placeholders for zero, one, one-or-more, zero-or-more any-tags to let writers express everything.

    Tino, is my intuition correct that it would not be so hard to turn
    this algorithmical idea into code? And what would it cost,
    speed-wise? Given that it would be relevant only for some rules, I
    guess, it can't be too bad.


Code-wise, not worth delaying the actual implementation for. It'd reach far into many corners, without actually getting us any closer to the correct solution.

Speed-wise, it would be bad.

-- Tino Didriksen

--
You received this message because you are subscribed to the Google Groups "Constraint Grammar" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To post to this group, send email to [email protected] <mailto:[email protected]>.
Visit this group at https://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.


--
Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: [email protected]
web: http://beta.visl.sdu.dk

--
You received this message because you are subscribed to the Google Groups 
"Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.

Reply via email to