Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Tanmai Khanna Thu, 03 Sep 2020 00:10:57 -0700

Oh I see the hyphen thing. That should've been fixed after the latest
commit. Will check it out.


*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 12:34 PM Tanmai Khanna <khanna.tan...@gmail.com>
wrote:

> Hey,
> As of now, the analyser sees wordbound blanks as normal blanks, and so
> when they occur, the dictionary will often not recognise multiwords. The
> reason this was done was because we are offloading multiwords to
> apertium-separable anyway. As for I<sup>ér</sup>, given that Tino Didriksen
> is able to fix this without adding spaces around it, adding I<b/>ér to the
> dictionary will make it recognise the word but there's a problem, as stated
> later. If spaces are added, then the space will be one blank and the
> wordbound blank another so it won't match.
>
> If these kind of cases can't be handled in apertium-separable, then I can
> at some point modify the analyser to ignore wblanks when doing FST
> matching, although I guess the point of the offloading was that in the
> analyser we stop handling anything that has a blank between it. But since
> wblanks aren't supposed to be "blanks", technically, we can have l<b/>ér to
> the dictionary, and modify the analyser to deal with it.
>
> But there is a much bigger problem wherever we handle this: wordbound
> blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's
> really no way to tell the pipe that the superscript applies just on the ér.
> So yeah, fundamentally, as of now *it's not possible to have markup on
> part of an LU.* It's possible though if you keep I & ér as separate LUs.
>
> Also Hèctor, is the space after hyphen issue still there? Looks fine to me.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen <m...@tinodidriksen.com>
> wrote:
>
>> That's not something the pipe ever sees - you can't fix it on your end.
>> It's something I have to adjust in Transfuse.
>>
>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>> and L629 expands inline tags to encompass surrounding plain text, because
>> it is unfortunately common for formatting to be partially on a word while
>> you really want the whole word translated as a unit.
>>
>> However, for HTML I should add spaces around <sub> and <sup> so that they
>> can't gobble up their surroundings. Tracked as
>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com>
>> wrote:
>>
>>> I'm taking a look on how this list of names on Wikipedia:
>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>> and how it is translated in beta.apertium:
>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>
>>> There still are quite a few problems with HTML-tags if we look that the
>>> whole Iér is becoming a superscript, and also with italics. The space after
>>> the hyphen is an already known problem.
>>>
>>> By the way, I wonder whether it is possible to match in our dictionaries
>>> I<sup>ér</sup>. I have Iér in the dictionary, but when the ending ér stays
>>> as a superscript, as usually done in the texts, it is not matched. Should I
>>> add I<b/>ér to the dictionary?
>>>
>>> Hèctor
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Reply via email to