Would it be worth designing a parsing library?

On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi> wrote:

> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
> > For khannatanmai's GSoC project, secondary tags will be implemented in a
> > backwards compatible manner. That it in itself indisputable. But, there
> is
> > a question of how the initial batch of secondary tags should look.
> >
> > I feel they should be in the form of <sf:cdefg>, as in a very short
> textual
> > lower-case prefix, followed by :, followed by whatever value there is. Or
> > even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>.
> >
> > spectie wants symbol prefixes in the form of <%:cdefg>.
>
> I feel like this is just a bikeshed[0] issue, but since I want this
> project to succeed I'll give my 2 cents / rants:
>
> I don't personally find apertium stream format readable, if I need to
> make sense of it I will anyways have to preprocess a lot, enough that
> I'd say apertium stream format need visualisation scripts to be
> readable. It's not very hard to have dev scripts for this. That being
> said, I don't find apertium stream format very machine readable either;
> with regexes you need tons of ëscapes and double escapes, with
> programming languages... well, you have to use regexes because it's not
> a standard format with readily available parsing library or a format
> neatly designed for python split() or c strtoks, or so... I'm fine with
> either special symbols or strings for whatever, as a purely personal
> preference I've been pro feature=value even before ud times but that's
> not important, as long as stuff is handlable with grep and sed without
> convoluted expressions it's all good, no? To that ggoal on the question
> of having known set of prefixes, I have always been of the opinion that
> any mature release-quality apertium stuff would follow the tags docu on
> the wiki[1], I would expect similar to be true for prefixes as well.
>
> One side note: I think there is a level of abstraction we often overlook
> in these developments; a part of language data developer base will
> probably interact with these secondary things through the XML formats if
> I understand correctly? Surely one of the things that can be done
> regardless of what kind of stream format representation the seodnary
> stuff has, is to have the xml format part more self-documenting and
> stream format more readale? And like eventually one could think there
> were tooling and visualisations or whatnot to support whatever readable
> and parsable formats if enough stuff is in the xml sources.
>
> so tldr; just pick whatever greppable stuff for apertium strem format.
>
> [0] <http://black.bikeshed.com/>
> [1] <https://wiki.apertium.org/wiki/List_of_symbols>
>
> --
> Regards, Flammie <https://flammie.github.io>
> (Please note, that I will often include my replies inline instead of
> top or bottom of the mail)
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to