El 2020-05-10 14:51, Samuel Sloniker escribió:
Would it be worth designing a parsing library?

On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi>
wrote:

On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
For khannatanmai's GSoC project, secondary tags will be
implemented in a
backwards compatible manner. That it in itself indisputable. But,
there is
a question of how the initial batch of secondary tags should look.

I feel they should be in the form of <sf:cdefg>, as in a very
short textual
lower-case prefix, followed by :, followed by whatever value there
is. Or
even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>.

spectie wants symbol prefixes in the form of <%:cdefg>.

I feel like this is just a bikeshed[0] issue, but since I want this
project to succeed I'll give my 2 cents / rants:

I don't personally find apertium stream format readable, if I need
to
make sense of it I will anyways have to preprocess a lot, enough
that
I'd say apertium stream format need visualisation scripts to be
readable. It's not very hard to have dev scripts for this. That
being
said, I don't find apertium stream format very machine readable
either;
with regexes you need tons of ëscapes and double escapes, with
programming languages... well, you have to use regexes because it's
not
a standard format with readily available parsing library or a format
neatly designed for python split() or c strtoks, or so... I'm fine
with
either special symbols or strings for whatever, as a purely personal
preference I've been pro feature=value even before ud times but
that's
not important, as long as stuff is handlable with grep and sed
without
convoluted expressions it's all good, no? To that ggoal on the
question
of having known set of prefixes, I have always been of the opinion
that
any mature release-quality apertium stuff would follow the tags docu
on
the wiki[1], I would expect similar to be true for prefixes as well.

One side note: I think there is a level of abstraction we often
overlook
in these developments; a part of language data developer base will
probably interact with these secondary things through the XML
formats if
I understand correctly? Surely one of the things that can be done
regardless of what kind of stream format representation the seodnary
stuff has, is to have the xml format part more self-documenting and
stream format more readale? And like eventually one could think
there
were tooling and visualisations or whatnot to support whatever
readable
and parsable formats if enough stuff is in the xml sources.

so tldr; just pick whatever greppable stuff for apertium strem
format.

[0] <http://black.bikeshed.com/>
[1] <https://wiki.apertium.org/wiki/List_of_symbols>

--
Regards, Flammie <https://flammie.github.io>
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

There is already
https://github.com/apertium/streamparser

for Python...

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to