Scoopgracie,
We discussed something similar to this on the IRC, while doing that would
make things very consistent, it would become too verbose, which is why it
might be easier to not have the feature:value format for primary
information, i,e., information that's almost always going to be there, and
only have it for secondary/optional information.

Secondly, by only adding a new format for secondary information, it
wouldn't disturb the current data files or even parsers too much.

However, if we all think consistency should be our primary focus, this
could be considered.

Tanmai

On Sun, Mar 29, 2020 at 2:29 AM Scoop Gracie <scoopgra...@gmail.com> wrote:

> Or <pl>=<number:pl>
>
> On Sat, Mar 28, 2020, 13:58 Scoop Gracie <scoopgra...@gmail.com> wrote:
>
>> That sounds like a great idea to me. Maybe <n> could even become <pos:n>?
>>
>> On Sat, Mar 28, 2020, 13:51 Tanmai Khanna <khanna.tan...@gmail.com>
>> wrote:
>>
>>> Hey guys,
>>> As part of the project to eliminate trimming, I had to come up with a
>>> way to include the surface form in the lexical unit and hence modifying the
>>> apertium stream format. To do this I would have to modify the parsers of
>>> every program in the pipeline, and if that has to happen, we discussed on
>>> the IRC that *it might be a good idea to modify the stream in such a
>>> way that we can include an arbitrary amount of information in a lexical
>>> unit, and each program can use whatever information they need.*
>>>
>>> The current information in the lexical unit would be primary
>>> information, and then we would have optional secondary information which
>>> could contain the surface form, but also literally anything you can think
>>> of (case, sentiment, pragmatic info, etc.). This would open up a lot of
>>> possibilities for each program, and it would strengthen the apertium stream
>>> format considerably.
>>>
>>> We discussed several possible syntax for this new stream format, and the
>>> one that seems the best is something like this:
>>>
>>> ^potato<n><pl><case:aa><sf:potatoes><other-prefix:other-value>/patata<n><f><pl><more:other>$
>>>
>>> This doesn't mess with the current stream format too much. The number of
>>> tags is already arbitrary so that helps. The secondary tags contain a ":"
>>> that would help distinguish them from primary tags.
>>>
>>> To implement this a modification would still be needed to all the
>>> parsers but the benefits far outweigh the amount of work needed to pull
>>> this off.
>>>
>>> Since this would be a major fundamental change to Apertium, I request
>>> you all to contribute with your views, any pros, cons, suggestions - to the
>>> idea, to the syntax, anything.
>>>
>>> Thanks and Regards,
>>> Tanmai Khanna
>>>
>>> --
>>> *Khanna, Tanmai*
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to