First of all, just to mention I don't consider myself a language developer
(but someone who messes around everything).

-  I think I would leave this for the "secondary tag" developer, similar to
what we already do to the "primary tags" one. For example, no-one forbids
currently having a primary tag with any symbol, as long as it's not a
stream-related one (<,>,^,$,+).
- Like Jonathan, I think we don't need to have things like
<surfaceform:xxxx>. It's too long, and would probably clutter the stream
too much. (Let's remember that, even if the stream is not meant to be
"human read", it is somewhat "human readable", and it being as concise as
possible helps.
- That said, I would *strongly encourage* the secondary tag developer to
have meaningful secondary tag prefixes, the same way we have meaningful
primary tags. While we don't have <name> or <preposition>, we also don't
have <€> and <£>, but <n> and <pr>. Having meaningful tags is an awesome
feature of the stream, that makes it relatively simple to manually create
input for any part of the pipeline (either to tests a specific command, to
write tests,...)

So I would *recommend *having short lowercase prefixes, that make it easy
to understand (or, at least, remember once seen once) what the secondary
tag is about.


Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 10 de maig
2020 a les 16:07:

> El 2020-05-10 14:51, Samuel Sloniker escribió:
> > Would it be worth designing a parsing library?
> >
> > On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi>
> > wrote:
> >
> >> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> >>> For khannatanmai's GSoC project, secondary tags will be
> >> implemented in a
> >>> backwards compatible manner. That it in itself indisputable. But,
> >> there is
> >>> a question of how the initial batch of secondary tags should look.
> >>>
> >>> I feel they should be in the form of <sf:cdefg>, as in a very
> >> short textual
> >>> lower-case prefix, followed by :, followed by whatever value there
> >> is. Or
> >>> even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>.
> >>>
> >>> spectie wants symbol prefixes in the form of <%:cdefg>.
> >>
> >> I feel like this is just a bikeshed[0] issue, but since I want this
> >> project to succeed I'll give my 2 cents / rants:
> >>
> >> I don't personally find apertium stream format readable, if I need
> >> to
> >> make sense of it I will anyways have to preprocess a lot, enough
> >> that
> >> I'd say apertium stream format need visualisation scripts to be
> >> readable. It's not very hard to have dev scripts for this. That
> >> being
> >> said, I don't find apertium stream format very machine readable
> >> either;
> >> with regexes you need tons of ëscapes and double escapes, with
> >> programming languages... well, you have to use regexes because it's
> >> not
> >> a standard format with readily available parsing library or a format
> >> neatly designed for python split() or c strtoks, or so... I'm fine
> >> with
> >> either special symbols or strings for whatever, as a purely personal
> >> preference I've been pro feature=value even before ud times but
> >> that's
> >> not important, as long as stuff is handlable with grep and sed
> >> without
> >> convoluted expressions it's all good, no? To that ggoal on the
> >> question
> >> of having known set of prefixes, I have always been of the opinion
> >> that
> >> any mature release-quality apertium stuff would follow the tags docu
> >> on
> >> the wiki[1], I would expect similar to be true for prefixes as well.
> >>
> >> One side note: I think there is a level of abstraction we often
> >> overlook
> >> in these developments; a part of language data developer base will
> >> probably interact with these secondary things through the XML
> >> formats if
> >> I understand correctly? Surely one of the things that can be done
> >> regardless of what kind of stream format representation the seodnary
> >> stuff has, is to have the xml format part more self-documenting and
> >> stream format more readale? And like eventually one could think
> >> there
> >> were tooling and visualisations or whatnot to support whatever
> >> readable
> >> and parsable formats if enough stuff is in the xml sources.
> >>
> >> so tldr; just pick whatever greppable stuff for apertium strem
> >> format.
> >>
> >> [0] <http://black.bikeshed.com/>
> >> [1] <https://wiki.apertium.org/wiki/List_of_symbols>
> >>
> >> --
> >> Regards, Flammie <https://flammie.github.io>
> >> (Please note, that I will often include my replies inline instead of
> >> top or bottom of the mail)
> >> _______________________________________________
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > _______________________________________________
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> There is already
> https://github.com/apertium/streamparser
>
> for Python...
>
> Fran
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
< Xavi Ivars >
< http://xavi.ivars.me >
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to