Not sure I see the problem ...
I wasn't talking about changing sets on the fly at run time, only a
*one-time* rewriting: Tag1·Tag2 tag strings being rewritten as //r
reg.ex'es in order to preserve order. This would only be rewritten once,
at compile time, not run time. And if we basically have a LINE tag
already, the re-written tag strings could be matched against this LINE.
The tag string rewriting is so simple that it could be done by a grammar
preprocessor rather than the actual compiler.
What would need to be adapted is only that this ordered-tag reg.ex'es
would be recognized as such and tested against LINE rather than ordinary
tags or set in the cohort. Isn't that just a simple IF branch during
target/context matching? A bit like treating $$ unification sets
differently?
Of course I'm talking algorithmically here, not claiming to predict the
complexity of an implementation. But it looks feasible to me.
-- Eckhard
On 06/13/2017 08:46 PM, Tino Didriksen wrote:
Replied inline...
On 10 June 2017 at 07:40, Eckhard Bick <[email protected]
<mailto:[email protected]>> wrote:
1. We introduce a magic tag LINE, maintained by the compiler,
constituted by the *whole* reading line (plus the word form at the
start) as *one* tag, i.e. *without breaking on space*.
That part is easy and basically already done. The reading already
stores an ordered list of tags - this is not the problem.
2. If LIST or on-the-fly definitions use a tag parenthesis with
space, e.g. (Tag1 Tag2), in a rule with the flag TAGORDER, this
will be converted internally to /^(.* )?Tag1 Tag2( .*)?$/r.
REMOVE TAGORDER (Tag3) IF (*1 (Tag1 Tag2)) ;
That's where it breaks. To change how some sets are compiled (or even
worse, recompiled for non-inline sets) based on a rule flag is a major
change and kludge. There is currently zero interaction between these
two parts, and there shouldn't be. Sets don't know they are being
parsed in the context of a rule, and it would be messy to add markers
to not deduplicate these new kinds of sets.
In addition to, or instead of, TAGORDER at the rule level, we
could also introduce the concept of a "nonbreaking space
character", e.g. · (mini-bullet) or double underscore, to allow
flexible use of tag order down at the level of individual
contexts: (Tag1·Tag2) or (Tag1__Tag2).
In the final solution, I will need to introduce regex-like * . .+ .*
(or whatever) as placeholders for zero, one, one-or-more, zero-or-more
any-tags to let writers express everything.
Tino, is my intuition correct that it would not be so hard to turn
this algorithmical idea into code? And what would it cost,
speed-wise? Given that it would be relevant only for some rules, I
guess, it can't be too bad.
Code-wise, not worth delaying the actual implementation for. It'd
reach far into many corners, without actually getting us any closer to
the correct solution.
Speed-wise, it would be bad.
-- Tino Didriksen
--
You received this message because you are subscribed to the Google
Groups "Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.
--
Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: [email protected]
web: http://beta.visl.sdu.dk
--
You received this message because you are subscribed to the Google Groups
"Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.