Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de set. 2020 a les 23:10:
> Hèctor, > The extra blank there is because there's a blank in your rule output. See: > > $ echo "^052<num>/052<num>$^F<n><m><sp>/F<n><m><sp>$" | apertium-transfer > -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin' > > ^num_n<SN><m><sp><sl_m><sl_sp>{^052<num>$ ^F<n><m><sp>$}$ > > > The rule for num_n has a <b/> in the rule output and hence there's a > space. The reason earlier there wasn't space was because an empty string > was considered a blank. Now, if you don't want a space between the LUs in > the rule output, you just don't put a <b/>. So if you remove the <b/> from > the num_n rule it will start working properly. Earlier you used to add a > <b/> everytime the rule had multiple LUs in the output but now *you only > add a <b/> if you want a space/blank between the output words.* > > > Try removing the <b/> and it should work. > So, you are saying that the new stuff is not backwards compatible, aren't you? There aren't any <b/> in the rule, but <b pos="1,2..."/>, which is not the same. Until now, <b/> means explicitly putting a blank, while <b pos="1,2..."/> means copying to the output whatever is in the input in a given point. Superblanks most of the time are blanks, but, as you now probably know better than anyone else, they can be lots of things; they can even contain no blanks at all. Even in some cases, like in Romance-language enclitics, we know there shouldn't be any blank at all before them, but we had to add <b pos="1,2..."/> for not loosing information on italics, bold letters, etc. I'm not really ready to change all <b pos="1,2..."/> in the hundreds of rules I've been writing in several language pairs. Specifically for apertium-fra-frp, I hope it will be able to publish it before the new version of the Apertium core you are preparing, so they are needed right now. Hèctor > > As for the discussion about I<b/>ér o 5<b/>e, we all agreed that we > don't want them in the dictionaries and hence you can analyse them as > individual LUs and then using apertium-separable you can combine them into > one LU. Finally, the space between l and ér shouldn't appear in the rule > output and it is because of an issue that's still being fixed. But it'll be > fine soon :) > > > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font <hectora...@gmail.com> > wrote: > >> Hi Tanmai, >> >> Yes, hyphens and quotes (") seem to be solved. But the system persists to >> add blanks where there were not. For instance, this causes that we get now >> strange Unicode codes: >> >> 05076. Table des caractères Unicode U+0500 à U+052F. >> < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F. >> --- >> > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F. >> >> The same for names of standards (e.g. 802.3j), road names, car (Fiat >> 621RN) or plane (EA-18G Growler) models, etc. >> >> On the <sup>...</sup> I wouldn't say that it is very beautiful. It could >> be misleading if there is just one character, as it often happens, like in >> 5e. In any case, what most interests me is how to deal with these things >> in the dictionaries. That's not a problem of the new blank-treatment or >> Transfuse. That's a problem we already had, but I never thought about it. I >> wouldn't like to have I<b/>ér o 5<b/>e in the dictionaries. It may cause >> problems, i.a. because ér and e can be words of their own, so we'll get a >> wrong morphological analysis. >> >> Hèctor >> >> >> >> >> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de >> set. 2020 a les 18:57: >> >>> Hèctor can you check the page on beta now? The hyphen and the >>> superscript issues are solved. Of course, there's now a space between l and >>> ér. If that's a big problem we can discuss other solutions. >>> >>> *तन्मय खन्ना * >>> *Tanmai Khanna* >>> >>> >>> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen <m...@tinodidriksen.com> >>> wrote: >>> >>>> I have adjusted Transfuse with how spaces are treated for Apertium, and >>>> implemented adding temporary spaces around <sub> and <sup>. Changes are >>>> deployed on beta. >>>> >>>> I repeat my plea that all symbols should have an analysis. It breaks >>>> markup that things like - and : are not tokens. >>>> >>>> -- Tino Didriksen >>>> >>>> >>>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen <m...@tinodidriksen.com> >>>> wrote: >>>> >>>>> That's not something the pipe ever sees - you can't fix it on your >>>>> end. It's something I have to adjust in Transfuse. >>>>> >>>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 >>>>> and L629 expands inline tags to encompass surrounding plain text, because >>>>> it is unfortunately common for formatting to be partially on a word while >>>>> you really want the whole word translated as a unit. >>>>> >>>>> However, for HTML I should add spaces around <sub> and <sup> so that >>>>> they can't gobble up their surroundings. Tracked as >>>>> https://github.com/TinoDidriksen/Transfuse/issues/7 >>>>> >>>>> -- Tino Didriksen >>>>> >>>>> >>>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com> >>>>> wrote: >>>>> >>>>>> I'm taking a look on how this list of names on Wikipedia: >>>>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8 >>>>>> and how it is translated in beta.apertium: >>>>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation >>>>>> >>>>>> There still are quite a few problems with HTML-tags if we look that >>>>>> the whole Iér is becoming a superscript, and also with italics. The space >>>>>> after the hyphen is an already known problem. >>>>>> >>>>>> By the way, I wonder whether it is possible to match in our >>>>>> dictionaries I<sup>ér</sup>. I have Iér in the dictionary, but when the >>>>>> ending ér stays as a superscript, as usually done in the texts, it is not >>>>>> matched. Should I add I<b/>ér to the dictionary? >>>>>> >>>>>> Hèctor >>>>>> >>>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff