Now it works! Thanks. On 6000 test sentences the Moses2 output is now actually 100% identical to the standard Moses output.
Vito 2016-09-28 16:12 GMT+02:00 Hieu Hoang <[email protected]>: > hi Vito, > > please git pull and try decoding again. I've just pushed a fix > https://github.com/hieuhoang/mosesdecoder/commit/ > 0005e98b2674906162ce7945c5edd6a42c9ca418 > Basically, I've changed changed the behavious of the pugi call so that it > doesn't unescape the &apos words > > > Hieu Hoang > http://www.hoang.co.uk/hieu > > On 28 September 2016 at 14:33, Hieu Hoang <[email protected]> wrote: > >> ah ok. do you have a moses.ini and example input sentence to go with that. >> >> pugixml.cpp is used to parse the input sentence for XML markups for >> placeholders, forced-translation etc. You shouldn't change the code for >> pugixml 'cos it's an imported library that we don't control and we may >> reimport in future if there are new releases. The problem seems to be >> Moses2' use of the library so it should be fixed in Moses2 >> >> Hieu Hoang >> http://www.hoang.co.uk/hieu >> >> On 28 September 2016 at 14:22, Vito Mandorino < >> [email protected]> wrote: >> >>> We are able to replicate the issue with the probingPT version of this >>> phrase-table: >>> >>> ' ||| ' ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> & ||| & ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> > ||| > ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> < ||| < ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> " ||| " ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> ||| ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>>   |||   ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| ||| >>> >>> If we understand well, the origin of the issue is in the function >>> strconv_escape in ./contrib/moses2/pugixml.cpp which replaces some of >>> these entities with the actual symbol. Commenting out that part seems to >>> fix the problem, but we wonder if this may cause any issues elsewhere since >>> we don't know the purpose of the entity replacement. >>> >>> Best regards, >>> Vito >>> >>> 2016-09-28 11:19 GMT+02:00 Hieu Hoang <[email protected]>: >>> >>>> Can you make your model files available for download? >>>> >>>> Moses and Moses2 aren't guaranteed to give exactly the same answer. >>>> However, they should be the same quality overall >>>> >>>> Hieu Hoang >>>> http://www.hoang.co.uk/hieu >>>> >>>> On 28 September 2016 at 09:53, Vito Mandorino < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> we are testing moses2 and we find a decrease in quality which seems to >>>>> be related to apostrophes. For instance: >>>>> >>>>> Source segment 1: >>>>> mise à disposition des actionnaires des documents d' information >>>>> relatifs à la sicav >>>>> >>>>> MT Moses: >>>>> provision shareholders of the briefing material for the sicav >>>>> >>>>> MT Moses2: >>>>> provision of shareholders documents d' information concerning the fund >>>>> >>>>> >>>>> Source segment 2: >>>>> tout titre qui deviendrait spéculatif à la suite d' une >>>>> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à >>>>> moins que le conseiller en investissement n' estime qu' il y va >>>>> de l' intérêt des actionnaires . >>>>> >>>>> MT Moses: >>>>> any security that would become speculative following a downgrading >>>>> after its takeover by the fund will not be liquidated , unless the >>>>> investment adviser believes it is in the interest of shareholders . >>>>> >>>>> MT Moses2: >>>>> any security that would become speculative following a possible >>>>> downgrade d' by the fund after its acquisition will not be liquidated , >>>>> unless the investment advisor believes n' stake qu' l' interest of >>>>> shareholders . >>>>> >>>>> It is actually strange that the raw MT output contains the apostrophe >>>>> symbol instead of the ' entity . What could the reason be? >>>>> >>>>> Best regards, >>>>> Vito >>>>> >>>>> >>>>> -- >>>>> *M**. Vito MANDORINO -- Chief Scientist* >>>>> >>>>> >>>>> [image: Description : Description : lingua_custodia_final full logo] >>>>> >>>>> *The Translation Trustee* >>>>> >>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>>>> >>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>>>> <%2B33%206%2084%2065%2068%2089>* >>>>> >>>>> *Email :* *[email protected] >>>>> <[email protected]>* >>>>> >>>>> *Website :* >>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>> >>> >>> >>> -- >>> *M**. Vito MANDORINO -- Chief Scientist* >>> >>> >>> [image: Description : Description : lingua_custodia_final full logo] >>> >>> *The Translation Trustee* >>> >>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>> >>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>> <%2B33%206%2084%2065%2068%2089>* >>> >>> *Email :* *[email protected] >>> <[email protected]>* >>> >>> *Website :* >>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>> >> >> > -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *[email protected] <[email protected]>* *Website :* *www.linguacustodia.finance <http://www.linguacustodia.com/>*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
