Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Tanmai Khanna Thu, 03 Sep 2020 00:05:40 -0700

Hey,
As of now, the analyser sees wordbound blanks as normal blanks, and so when
they occur, the dictionary will often not recognise multiwords. The reason
this was done was because we are offloading multiwords to
apertium-separable anyway. As for I<sup>ér</sup>, given that Tino Didriksen
is able to fix this without adding spaces around it, adding I<b/>ér to the
dictionary will make it recognise the word but there's a problem, as stated
later. If spaces are added, then the space will be one blank and the
wordbound blank another so it won't match.

If these kind of cases can't be handled in apertium-separable, then I can
at some point modify the analyser to ignore wblanks when doing FST
matching, although I guess the point of the offloading was that in the
analyser we stop handling anything that has a blank between it. But since
wblanks aren't supposed to be "blanks", technically, we can have l<b/>ér to
the dictionary, and modify the analyser to deal with it.

But there is a much bigger problem wherever we handle this: wordbound
blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's
really no way to tell the pipe that the superscript applies just on the ér.
So yeah, fundamentally, as of now *it's not possible to have markup on part
of an LU.* It's possible though if you keep I & ér as separate LUs.

Also Hèctor, is the space after hyphen issue still there? Looks fine to me.

*तन्मय खन्ना *
*Tanmai Khanna*

On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen <m...@tinodidriksen.com>
wrote:

> That's not something the pipe ever sees - you can't fix it on your end.
> It's something I have to adjust in Transfuse.
>
> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
> and L629 expands inline tags to encompass surrounding plain text, because
> it is unfortunately common for formatting to be partially on a word while
> you really want the whole word translated as a unit.
>
> However, for HTML I should add spaces around <sub> and <sup> so that they
> can't gobble up their surroundings. Tracked as
> https://github.com/TinoDidriksen/Transfuse/issues/7
>
> -- Tino Didriksen
>
>
> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> I'm taking a look on how this list of names on Wikipedia:
>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>> and how it is translated in beta.apertium:
>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>
>> There still are quite a few problems with HTML-tags if we look that the
>> whole Iér is becoming a superscript, and also with italics. The space after
>> the hyphen is an already known problem.
>>
>> By the way, I wonder whether it is possible to match in our dictionaries
>> I<sup>ér</sup>. I have Iér in the dictionary, but when the ending ér stays
>> as a superscript, as usually done in the texts, it is not matched. Should I
>> add I<b/>ér to the dictionary?
>>
>> Hèctor
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Reply via email to