Re: [Moses-support] Unicode Issues when Using Compact Phrase Table, Binaries vs. Own Build

Венцислав Жечев (Ventsislav Zhechev) Tue, 31 Mar 2015 03:36:55 -0700

Hi,

Any clue what systems could be messed up? On Ubuntu I complied boost 1.57, cmph 
and Moses right out of the box, so I don’t see what I could have done wrong 
there.


I just checked and the gzip phrase tables are proper UTF-8. I even ran the 
processPhraseTableMin binary from the website on the Ubuntu machine and still 
got the same results. That is, if I query the compact phrase table with the 
queryPhraseTableMin binary from the website, UTF-8 is recognised and I get 
results; if I use queryPhraseTableMin that I complied on the same system, UTF-8 
is not recognised and I get no results.

Does anyone have an idea what could influence the compilation of Moses in a way 
that would prevent it from properly reading UTF-8?
Especially given that the Moses binaries for MacOS X from the website don’t 
seem to read UTF-8 properly (at least on my machine), and I didn’t compile 
those.


Cheers,

Ventzi

> 30.03.2015 г., в 11:08, [email protected] написал(а):
> 
> Date: Mon, 30 Mar 2015 11:08:13 +0200
> From: Marcin Junczys-Dowmunt <[email protected]>
> Subject: Re: [Moses-support] Unicode Issues when Using Compact Phrase
>       Table, Binaries vs. Own Build
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> the phrase-table and as far as I know Moses in general are 
> unicode-agnostic, as long as you use utf-8. Input is handled as raw byte 
> sequences, most of the time there are numeric identifiers only.
> Sounds more like a couple of messed up systems on your side, especially 
> the part where self-compiled systems work or don't work. Cannot give you 
> much more insight, unfortunately.
> Best,
> Marcin


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Unicode Issues when Using Compact Phrase Table, Binaries vs. Own Build

Reply via email to