Would it be worth designing a parsing library? On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi> wrote:
> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote: > > For khannatanmai's GSoC project, secondary tags will be implemented in a > > backwards compatible manner. That it in itself indisputable. But, there > is > > a question of how the initial batch of secondary tags should look. > > > > I feel they should be in the form of <sf:cdefg>, as in a very short > textual > > lower-case prefix, followed by :, followed by whatever value there is. Or > > even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>. > > > > spectie wants symbol prefixes in the form of <%:cdefg>. > > I feel like this is just a bikeshed[0] issue, but since I want this > project to succeed I'll give my 2 cents / rants: > > I don't personally find apertium stream format readable, if I need to > make sense of it I will anyways have to preprocess a lot, enough that > I'd say apertium stream format need visualisation scripts to be > readable. It's not very hard to have dev scripts for this. That being > said, I don't find apertium stream format very machine readable either; > with regexes you need tons of ëscapes and double escapes, with > programming languages... well, you have to use regexes because it's not > a standard format with readily available parsing library or a format > neatly designed for python split() or c strtoks, or so... I'm fine with > either special symbols or strings for whatever, as a purely personal > preference I've been pro feature=value even before ud times but that's > not important, as long as stuff is handlable with grep and sed without > convoluted expressions it's all good, no? To that ggoal on the question > of having known set of prefixes, I have always been of the opinion that > any mature release-quality apertium stuff would follow the tags docu on > the wiki[1], I would expect similar to be true for prefixes as well. > > One side note: I think there is a level of abstraction we often overlook > in these developments; a part of language data developer base will > probably interact with these secondary things through the XML formats if > I understand correctly? Surely one of the things that can be done > regardless of what kind of stream format representation the seodnary > stuff has, is to have the xml format part more self-documenting and > stream format more readale? And like eventually one could think there > were tooling and visualisations or whatnot to support whatever readable > and parsable formats if enough stuff is in the xml sources. > > so tldr; just pick whatever greppable stuff for apertium strem format. > > [0] <http://black.bikeshed.com/> > [1] <https://wiki.apertium.org/wiki/List_of_symbols> > > -- > Regards, Flammie <https://flammie.github.io> > (Please note, that I will often include my replies inline instead of > top or bottom of the mail) > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff