Now it works! Thanks. On 6000 test sentences the Moses2 output is now
actually 100% identical to the standard Moses output.

Vito

2016-09-28 16:12 GMT+02:00 Hieu Hoang <[email protected]>:

> hi Vito,
>
> please git pull and try decoding again. I've just pushed a fix
>    https://github.com/hieuhoang/mosesdecoder/commit/
> 0005e98b2674906162ce7945c5edd6a42c9ca418
> Basically, I've changed changed the behavious of the pugi call so that it
> doesn't unescape the &apos words
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 28 September 2016 at 14:33, Hieu Hoang <[email protected]> wrote:
>
>> ah ok. do you have a moses.ini and example input sentence to go with that.
>>
>> pugixml.cpp is used to parse the input sentence for XML markups for
>> placeholders, forced-translation etc. You shouldn't change the code for
>> pugixml 'cos it's an imported library that we don't control and we may
>> reimport in future if there are new releases. The problem seems to be
>> Moses2' use of the library so it should be fixed in Moses2
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 28 September 2016 at 14:22, Vito Mandorino <
>> [email protected]> wrote:
>>
>>> We are able to replicate the issue with the probingPT version of this
>>> phrase-table:
>>>
>>> &apos; ||| &apos; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &amp; ||| &amp; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &gt; ||| &gt; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &lt; ||| &lt; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &quot; ||| &quot; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &nbsp; ||| &nbsp; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>> &#160; ||| &#160; ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>
>>> If we understand well, the origin of the issue is in the function
>>> strconv_escape in ./contrib/moses2/pugixml.cpp  which replaces some of
>>> these entities with the actual symbol. Commenting out that part seems to
>>> fix the problem, but we wonder if this may cause any issues elsewhere since
>>> we don't know the purpose of the entity replacement.
>>>
>>> Best regards,
>>> Vito
>>>
>>> 2016-09-28 11:19 GMT+02:00 Hieu Hoang <[email protected]>:
>>>
>>>> Can you make your model files available for download?
>>>>
>>>> Moses and Moses2 aren't guaranteed to give exactly the same answer.
>>>> However, they should be the same quality overall
>>>>
>>>> Hieu Hoang
>>>> http://www.hoang.co.uk/hieu
>>>>
>>>> On 28 September 2016 at 09:53, Vito Mandorino <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> we are testing moses2 and we find a decrease in quality which seems to
>>>>> be related to apostrophes. For instance:
>>>>>
>>>>> Source segment 1:
>>>>> mise à disposition des actionnaires des documents d&apos; information
>>>>> relatifs à la sicav
>>>>>
>>>>> MT Moses:
>>>>> provision shareholders of the briefing material for the sicav
>>>>>
>>>>> MT Moses2:
>>>>> provision of shareholders documents d' information concerning the fund
>>>>>
>>>>>
>>>>> Source segment 2:
>>>>> tout titre qui deviendrait spéculatif à la suite d&apos; une
>>>>> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
>>>>> moins que le conseiller en investissement n&apos; estime qu&apos; il y va
>>>>> de l&apos; intérêt des actionnaires .
>>>>>
>>>>> MT Moses:
>>>>> any security that would become speculative following a downgrading
>>>>> after its takeover by the fund will not be liquidated , unless the
>>>>> investment adviser believes it is in the interest of shareholders .
>>>>>
>>>>> MT Moses2:
>>>>> any security that would become speculative following a possible
>>>>> downgrade d' by the fund after its acquisition will not be liquidated ,
>>>>> unless the investment advisor believes n' stake qu' l' interest of
>>>>> shareholders .
>>>>>
>>>>> It is actually strange that the raw MT output contains the apostrophe
>>>>> symbol instead of the &apos; entity . What could the reason be?
>>>>>
>>>>> Best regards,
>>>>> Vito
>>>>>
>>>>>
>>>>> --
>>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>>
>>>>>
>>>>> [image: Description : Description : lingua_custodia_final full logo]
>>>>>
>>>>>  *The Translation Trustee*
>>>>>
>>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>>>
>>>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>>>> <%2B33%206%2084%2065%2068%2089>*
>>>>>
>>>>> *Email :*  *[email protected]
>>>>> <[email protected]>*
>>>>>
>>>>> *Website :*
>>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *[email protected]
>>> <[email protected]>*
>>>
>>> *Website :*
>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>
>>
>>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *[email protected]
<[email protected]>*

*Website :*
*www.linguacustodia.finance <http://www.linguacustodia.com/>*
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to