Oh, okay, that makes sense. I was also thinking it might make it easier for humans to debug the format.
On Sat, Mar 28, 2020, 14:55 Tanmai Khanna <khanna.tan...@gmail.com> wrote: > Scoopgracie, > We discussed something similar to this on the IRC, while doing that would > make things very consistent, it would become too verbose, which is why it > might be easier to not have the feature:value format for primary > information, i,e., information that's almost always going to be there, and > only have it for secondary/optional information. > > Secondly, by only adding a new format for secondary information, it > wouldn't disturb the current data files or even parsers too much. > > However, if we all think consistency should be our primary focus, this > could be considered. > > Tanmai > > On Sun, Mar 29, 2020 at 2:29 AM Scoop Gracie <scoopgra...@gmail.com> > wrote: > >> Or <pl>=<number:pl> >> >> On Sat, Mar 28, 2020, 13:58 Scoop Gracie <scoopgra...@gmail.com> wrote: >> >>> That sounds like a great idea to me. Maybe <n> could even become <pos:n>? >>> >>> On Sat, Mar 28, 2020, 13:51 Tanmai Khanna <khanna.tan...@gmail.com> >>> wrote: >>> >>>> Hey guys, >>>> As part of the project to eliminate trimming, I had to come up with a >>>> way to include the surface form in the lexical unit and hence modifying the >>>> apertium stream format. To do this I would have to modify the parsers of >>>> every program in the pipeline, and if that has to happen, we discussed on >>>> the IRC that *it might be a good idea to modify the stream in such a >>>> way that we can include an arbitrary amount of information in a lexical >>>> unit, and each program can use whatever information they need.* >>>> >>>> The current information in the lexical unit would be primary >>>> information, and then we would have optional secondary information which >>>> could contain the surface form, but also literally anything you can think >>>> of (case, sentiment, pragmatic info, etc.). This would open up a lot of >>>> possibilities for each program, and it would strengthen the apertium stream >>>> format considerably. >>>> >>>> We discussed several possible syntax for this new stream format, and >>>> the one that seems the best is something like this: >>>> >>>> ^potato<n><pl><case:aa><sf:potatoes><other-prefix:other-value>/patata<n><f><pl><more:other>$ >>>> >>>> This doesn't mess with the current stream format too much. The number >>>> of tags is already arbitrary so that helps. The secondary tags contain a >>>> ":" that would help distinguish them from primary tags. >>>> >>>> To implement this a modification would still be needed to all the >>>> parsers but the benefits far outweigh the amount of work needed to pull >>>> this off. >>>> >>>> Since this would be a major fundamental change to Apertium, I request >>>> you all to contribute with your views, any pros, cons, suggestions - to the >>>> idea, to the syntax, anything. >>>> >>>> Thanks and Regards, >>>> Tanmai Khanna >>>> >>>> -- >>>> *Khanna, Tanmai* >>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > > > -- > *Khanna, Tanmai* > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff