Hi, Any clue what systems could be messed up? On Ubuntu I complied boost 1.57, cmph and Moses right out of the box, so I don’t see what I could have done wrong there.
I just checked and the gzip phrase tables are proper UTF-8. I even ran the processPhraseTableMin binary from the website on the Ubuntu machine and still got the same results. That is, if I query the compact phrase table with the queryPhraseTableMin binary from the website, UTF-8 is recognised and I get results; if I use queryPhraseTableMin that I complied on the same system, UTF-8 is not recognised and I get no results. Does anyone have an idea what could influence the compilation of Moses in a way that would prevent it from properly reading UTF-8? Especially given that the Moses binaries for MacOS X from the website don’t seem to read UTF-8 properly (at least on my machine), and I didn’t compile those. Cheers, Ventzi > 30.03.2015 г., в 11:08, [email protected] написал(а): > > Date: Mon, 30 Mar 2015 11:08:13 +0200 > From: Marcin Junczys-Dowmunt <[email protected]> > Subject: Re: [Moses-support] Unicode Issues when Using Compact Phrase > Table, Binaries vs. Own Build > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Hi, > the phrase-table and as far as I know Moses in general are > unicode-agnostic, as long as you use utf-8. Input is handled as raw byte > sequences, most of the time there are numeric identifiers only. > Sounds more like a couple of messed up systems on your side, especially > the part where self-compiled systems work or don't work. Cannot give you > much more insight, unfortunately. > Best, > Marcin _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
