Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de set.
2020 a les 23:10:

> Hèctor,
> The extra blank there is because there's a blank in your rule output. See:
>
> $ echo "^052<num>/052<num>$^F<n><m><sp>/F<n><m><sp>$" | apertium-transfer
> -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin'
>
> ^num_n<SN><m><sp><sl_m><sl_sp>{^052<num>$ ^F<n><m><sp>$}$
>
>
> The rule for num_n has a <b/> in the rule output and hence there's a
> space. The reason earlier there wasn't space was because an empty string
> was considered a blank. Now, if you don't want a space between the LUs in
> the rule output, you just don't put a <b/>. So if you remove the <b/> from
> the num_n rule it will start working properly. Earlier you used to add a
> <b/> everytime the rule had multiple LUs in the output but now *you only
> add a <b/> if you want a space/blank between the output words.*
>
>
> Try removing the <b/> and it should work.
>


So, you are saying that the new stuff is not backwards compatible, aren't
you? There aren't any <b/> in the rule, but <b pos="1,2..."/>, which is not
the same. Until now, <b/> means explicitly putting a blank, while <b
pos="1,2..."/> means copying to the output whatever is in the input in a
given point. Superblanks most of the time are blanks, but, as you now
probably know better than anyone else, they can be lots of things; they
can even contain no blanks at all. Even in some cases, like in
Romance-language enclitics, we know there shouldn't be any blank at all
before them, but we had to add <b pos="1,2..."/> for not
loosing information on italics, bold letters, etc.

I'm not really ready to change all <b pos="1,2..."/> in the hundreds of
rules I've been writing in several language pairs. Specifically for
apertium-fra-frp, I hope it will be able to publish it before the new
version of the Apertium core you are preparing, so they are needed right
now.

Hèctor


>
> As for the discussion about  I<b/>ér o 5<b/>e, we all agreed that we
> don't want them in the dictionaries and hence you can analyse them as
> individual LUs and then using apertium-separable you can combine them into
> one LU. Finally, the space between l and ér shouldn't appear in the rule
> output and it is because of an issue that's still being fixed. But it'll be
> fine soon :)
>
>
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> Hi Tanmai,
>>
>> Yes, hyphens and quotes (") seem to be solved. But the system persists to
>> add blanks where there were not. For instance, this causes that we get now
>> strange Unicode codes:
>>
>> 05076. Table des caractères Unicode U+0500 à U+052F.
>> < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
>> ---
>> > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.
>>
>> The same for names of standards (e.g. 802.3j), road names, car (Fiat
>> 621RN) or plane (EA-18G Growler) models, etc.
>>
>> On the <sup>...</sup> I wouldn't say that it is very beautiful. It could
>> be misleading if there is just one character, as it often happens, like in
>> 5e. In any case, what most interests me is how to deal with these things
>> in the dictionaries. That's not a problem of the new blank-treatment or
>> Transfuse. That's a problem we already had, but I never thought about it. I
>> wouldn't like to have I<b/>ér o 5<b/>e in the dictionaries. It may cause
>> problems, i.a. because ér and e can be words of their own, so we'll get a
>> wrong morphological analysis.
>>
>> Hèctor
>>
>>
>>
>>
>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dj., 3 de
>> set. 2020 a les 18:57:
>>
>>> Hèctor can you check the page on beta now? The hyphen and the
>>> superscript issues are solved. Of course, there's now a space between l and
>>> ér. If that's a big problem we can discuss other solutions.
>>>
>>> *तन्मय खन्ना *
>>> *Tanmai Khanna*
>>>
>>>
>>> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen <m...@tinodidriksen.com>
>>> wrote:
>>>
>>>> I have adjusted Transfuse with how spaces are treated for Apertium, and
>>>> implemented adding temporary spaces around <sub> and <sup>. Changes are
>>>> deployed on beta.
>>>>
>>>> I repeat my plea that all symbols should have an analysis. It breaks
>>>> markup that things like - and : are not tokens.
>>>>
>>>> -- Tino Didriksen
>>>>
>>>>
>>>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen <m...@tinodidriksen.com>
>>>> wrote:
>>>>
>>>>> That's not something the pipe ever sees - you can't fix it on your
>>>>> end. It's something I have to adjust in Transfuse.
>>>>>
>>>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>>>>> and L629 expands inline tags to encompass surrounding plain text, because
>>>>> it is unfortunately common for formatting to be partially on a word while
>>>>> you really want the whole word translated as a unit.
>>>>>
>>>>> However, for HTML I should add spaces around <sub> and <sup> so that
>>>>> they can't gobble up their surroundings. Tracked as
>>>>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>>>>
>>>>> -- Tino Didriksen
>>>>>
>>>>>
>>>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font <hectora...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm taking a look on how this list of names on Wikipedia:
>>>>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>>>>> and how it is translated in beta.apertium:
>>>>>> https://beta.apertium.org/index.fra.html?dir=frp-fra&qP=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>>>>
>>>>>> There still are quite a few problems with HTML-tags if we look that
>>>>>> the whole Iér is becoming a superscript, and also with italics. The space
>>>>>> after the hyphen is an already known problem.
>>>>>>
>>>>>> By the way, I wonder whether it is possible to match in our
>>>>>> dictionaries I<sup>ér</sup>. I have Iér in the dictionary, but when the
>>>>>> ending ér stays as a superscript, as usually done in the texts, it is not
>>>>>> matched. Should I add I<b/>ér to the dictionary?
>>>>>>
>>>>>> Hèctor
>>>>>>
>>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to