On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote: > For khannatanmai's GSoC project, secondary tags will be implemented in a > backwards compatible manner. That it in itself indisputable. But, there is > a question of how the initial batch of secondary tags should look. > > I feel they should be in the form of <sf:cdefg>, as in a very short textual > lower-case prefix, followed by :, followed by whatever value there is. Or > even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>. > > spectie wants symbol prefixes in the form of <%:cdefg>.
I feel like this is just a bikeshed[0] issue, but since I want this project to succeed I'll give my 2 cents / rants: I don't personally find apertium stream format readable, if I need to make sense of it I will anyways have to preprocess a lot, enough that I'd say apertium stream format need visualisation scripts to be readable. It's not very hard to have dev scripts for this. That being said, I don't find apertium stream format very machine readable either; with regexes you need tons of ëscapes and double escapes, with programming languages... well, you have to use regexes because it's not a standard format with readily available parsing library or a format neatly designed for python split() or c strtoks, or so... I'm fine with either special symbols or strings for whatever, as a purely personal preference I've been pro feature=value even before ud times but that's not important, as long as stuff is handlable with grep and sed without convoluted expressions it's all good, no? To that ggoal on the question of having known set of prefixes, I have always been of the opinion that any mature release-quality apertium stuff would follow the tags docu on the wiki[1], I would expect similar to be true for prefixes as well. One side note: I think there is a level of abstraction we often overlook in these developments; a part of language data developer base will probably interact with these secondary things through the XML formats if I understand correctly? Surely one of the things that can be done regardless of what kind of stream format representation the seodnary stuff has, is to have the xml format part more self-documenting and stream format more readale? And like eventually one could think there were tooling and visualisations or whatnot to support whatever readable and parsable formats if enough stuff is in the xml sources. so tldr; just pick whatever greppable stuff for apertium strem format. [0] <http://black.bikeshed.com/> [1] <https://wiki.apertium.org/wiki/List_of_symbols> -- Regards, Flammie <https://flammie.github.io> (Please note, that I will often include my replies inline instead of top or bottom of the mail)
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff