Oh, okay, that makes sense. I was also thinking it might make it easier for
humans to debug the format.

On Sat, Mar 28, 2020, 14:55 Tanmai Khanna <khanna.tan...@gmail.com> wrote:

> Scoopgracie,
> We discussed something similar to this on the IRC, while doing that would
> make things very consistent, it would become too verbose, which is why it
> might be easier to not have the feature:value format for primary
> information, i,e., information that's almost always going to be there, and
> only have it for secondary/optional information.
>
> Secondly, by only adding a new format for secondary information, it
> wouldn't disturb the current data files or even parsers too much.
>
> However, if we all think consistency should be our primary focus, this
> could be considered.
>
> Tanmai
>
> On Sun, Mar 29, 2020 at 2:29 AM Scoop Gracie <scoopgra...@gmail.com>
> wrote:
>
>> Or <pl>=<number:pl>
>>
>> On Sat, Mar 28, 2020, 13:58 Scoop Gracie <scoopgra...@gmail.com> wrote:
>>
>>> That sounds like a great idea to me. Maybe <n> could even become <pos:n>?
>>>
>>> On Sat, Mar 28, 2020, 13:51 Tanmai Khanna <khanna.tan...@gmail.com>
>>> wrote:
>>>
>>>> Hey guys,
>>>> As part of the project to eliminate trimming, I had to come up with a
>>>> way to include the surface form in the lexical unit and hence modifying the
>>>> apertium stream format. To do this I would have to modify the parsers of
>>>> every program in the pipeline, and if that has to happen, we discussed on
>>>> the IRC that *it might be a good idea to modify the stream in such a
>>>> way that we can include an arbitrary amount of information in a lexical
>>>> unit, and each program can use whatever information they need.*
>>>>
>>>> The current information in the lexical unit would be primary
>>>> information, and then we would have optional secondary information which
>>>> could contain the surface form, but also literally anything you can think
>>>> of (case, sentiment, pragmatic info, etc.). This would open up a lot of
>>>> possibilities for each program, and it would strengthen the apertium stream
>>>> format considerably.
>>>>
>>>> We discussed several possible syntax for this new stream format, and
>>>> the one that seems the best is something like this:
>>>>
>>>> ^potato<n><pl><case:aa><sf:potatoes><other-prefix:other-value>/patata<n><f><pl><more:other>$
>>>>
>>>> This doesn't mess with the current stream format too much. The number
>>>> of tags is already arbitrary so that helps. The secondary tags contain a
>>>> ":" that would help distinguish them from primary tags.
>>>>
>>>> To implement this a modification would still be needed to all the
>>>> parsers but the benefits far outweigh the amount of work needed to pull
>>>> this off.
>>>>
>>>> Since this would be a major fundamental change to Apertium, I request
>>>> you all to contribute with your views, any pros, cons, suggestions - to the
>>>> idea, to the syntax, anything.
>>>>
>>>> Thanks and Regards,
>>>> Tanmai Khanna
>>>>
>>>> --
>>>> *Khanna, Tanmai*
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> *Khanna, Tanmai*
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to