Hey, As of now, the analyser sees wordbound blanks as normal blanks, and so when they occur, the dictionary will often not recognise multiwords. The reason this was done was because we are offloading multiwords to apertium-separable anyway. As for I<sup>ér</sup>, given that Tino Didriksen is able to fix this without adding spaces around it, adding I<b/>ér to the dictionary will make it recognise the word but there's a problem, as stated later. If spaces are added, then the space will be one blank and the wordbound blank another so it won't match.
If these kind of cases can't be handled in apertium-separable, then I can at some point modify the analyser to ignore wblanks when doing FST matching, although I guess the point of the offloading was that in the analyser we stop handling anything that has a blank between it. But since wblanks aren't supposed to be "blanks", technically, we can have l<b/>ér to the dictionary, and modify the analyser to deal with it. But there is a much bigger problem wherever we handle this: wordbound blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's really no way to tell the pipe that the superscript applies just on the ér. So yeah, fundamentally, as of now *it's not possible to have markup on part of an LU.* It's possible though if you keep I & ér as separate LUs. Also Hèctor, is the space after hyphen issue still there? Looks fine to me. *तन्मय खन्ना * *Tanmai Khanna* On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen <m...@tinodidriksen.com> wrote: > That's not something the pipe ever sees - you can't fix it on your end. > It's something I have to adjust in Transfuse. > > https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 > and L629 expands inline tags to encompass surrounding plain text, because > it is unfortunately common for formatting to be partially on a word while > you really want the whole word translated as a unit. > > However, for HTML I should add spaces around <sub> and <sup> so that they > can't gobble up their surroundings. Tracked as > https://github.com/TinoDidriksen/Transfuse/issues/7 > > -- Tino Didriksen > > > On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com> > wrote: > >> I'm taking a look on how this list of names on Wikipedia: >> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8 >> and how it is translated in beta.apertium: >> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation >> >> There still are quite a few problems with HTML-tags if we look that the >> whole Iér is becoming a superscript, and also with italics. The space after >> the hyphen is an already known problem. >> >> By the way, I wonder whether it is possible to match in our dictionaries >> I<sup>ér</sup>. I have Iér in the dictionary, but when the ending ér stays >> as a superscript, as usually done in the texts, it is not matched. Should I >> add I<b/>ér to the dictionary? >> >> Hèctor >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff