On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> For khannatanmai's GSoC project, secondary tags will be implemented in a
> backwards compatible manner. That it in itself indisputable. But, there is
> a question of how the initial batch of secondary tags should look.
> 
> I feel they should be in the form of <sf:cdefg>, as in a very short textual
> lower-case prefix, followed by :, followed by whatever value there is. Or
> even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>.
> 
> spectie wants symbol prefixes in the form of <%:cdefg>.

I feel like this is just a bikeshed[0] issue, but since I want this
project to succeed I'll give my 2 cents / rants:

I don't personally find apertium stream format readable, if I need to
make sense of it I will anyways have to preprocess a lot, enough that
I'd say apertium stream format need visualisation scripts to be
readable. It's not very hard to have dev scripts for this. That being
said, I don't find apertium stream format very machine readable either;
with regexes you need tons of ëscapes and double escapes, with
programming languages... well, you have to use regexes because it's not
a standard format with readily available parsing library or a format
neatly designed for python split() or c strtoks, or so... I'm fine with
either special symbols or strings for whatever, as a purely personal
preference I've been pro feature=value even before ud times but that's
not important, as long as stuff is handlable with grep and sed without
convoluted expressions it's all good, no? To that ggoal on the question
of having known set of prefixes, I have always been of the opinion that
any mature release-quality apertium stuff would follow the tags docu on
the wiki[1], I would expect similar to be true for prefixes as well.

One side note: I think there is a level of abstraction we often overlook
in these developments; a part of language data developer base will
probably interact with these secondary things through the XML formats if
I understand correctly? Surely one of the things that can be done
regardless of what kind of stream format representation the seodnary
stuff has, is to have the xml format part more self-documenting and
stream format more readale? And like eventually one could think there
were tooling and visualisations or whatnot to support whatever readable
and parsable formats if enough stuff is in the xml sources.

so tldr; just pick whatever greppable stuff for apertium strem format.

[0] <http://black.bikeshed.com/>
[1] <https://wiki.apertium.org/wiki/List_of_symbols>

-- 
Regards, Flammie <https://flammie.github.io>
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to