Oh I see the hyphen thing. That should've been fixed after the latest commit. Will check it out.
*तन्मय खन्ना * *Tanmai Khanna* On Thu, Sep 3, 2020 at 12:34 PM Tanmai Khanna <khanna.tan...@gmail.com> wrote: > Hey, > As of now, the analyser sees wordbound blanks as normal blanks, and so > when they occur, the dictionary will often not recognise multiwords. The > reason this was done was because we are offloading multiwords to > apertium-separable anyway. As for I<sup>ér</sup>, given that Tino Didriksen > is able to fix this without adding spaces around it, adding I<b/>ér to the > dictionary will make it recognise the word but there's a problem, as stated > later. If spaces are added, then the space will be one blank and the > wordbound blank another so it won't match. > > If these kind of cases can't be handled in apertium-separable, then I can > at some point modify the analyser to ignore wblanks when doing FST > matching, although I guess the point of the offloading was that in the > analyser we stop handling anything that has a blank between it. But since > wblanks aren't supposed to be "blanks", technically, we can have l<b/>ér to > the dictionary, and modify the analyser to deal with it. > > But there is a much bigger problem wherever we handle this: wordbound > blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's > really no way to tell the pipe that the superscript applies just on the ér. > So yeah, fundamentally, as of now *it's not possible to have markup on > part of an LU.* It's possible though if you keep I & ér as separate LUs. > > Also Hèctor, is the space after hyphen issue still there? Looks fine to me. > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen <m...@tinodidriksen.com> > wrote: > >> That's not something the pipe ever sees - you can't fix it on your end. >> It's something I have to adjust in Transfuse. >> >> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 >> and L629 expands inline tags to encompass surrounding plain text, because >> it is unfortunately common for formatting to be partially on a word while >> you really want the whole word translated as a unit. >> >> However, for HTML I should add spaces around <sub> and <sup> so that they >> can't gobble up their surroundings. Tracked as >> https://github.com/TinoDidriksen/Transfuse/issues/7 >> >> -- Tino Didriksen >> >> >> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com> >> wrote: >> >>> I'm taking a look on how this list of names on Wikipedia: >>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8 >>> and how it is translated in beta.apertium: >>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation >>> >>> There still are quite a few problems with HTML-tags if we look that the >>> whole Iér is becoming a superscript, and also with italics. The space after >>> the hyphen is an already known problem. >>> >>> By the way, I wonder whether it is possible to match in our dictionaries >>> I<sup>ér</sup>. I have Iér in the dictionary, but when the ending ér stays >>> as a superscript, as usually done in the texts, it is not matched. Should I >>> add I<b/>ér to the dictionary? >>> >>> Hèctor >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff