OK, I think the mystery is solved. The text version does not contain alignment information. The standard algorithm for the compact phrase table requires alignment information to work properly. If alignments are not present, you should use the "-encoding None -no-alignment-info" options (bigger, but still quite compact). It's even mentioned in the documentation, but I think I should add a test to the binarization tool, that croaks if alignment data is missing. A test with your phrase table and "-encoding None -no-alignment-info" works fine and produces the correct translation now. This also explains why the compression was so cruelly slow and the results is even smaller than the incorrectly built one. You wrote you used a version from July 2012 for training, with a recent moses version, this issue would not have arisen. Included alignment is now standard in the training scripts and then you can use the standard procedure for compact binarization, this should save some additional 30%.

BTW: New moses is very verbose, is this on purpose?
Best,
Marcin

W dniu 04.07.2013 22:01, Marcin Junczys-Dowmunt pisze:
The binary format in the main branch actually never changed from the moment I released it. So it should not be an issue of binary incompatibility. I am planning to add version numbers with the first change in the binary This format other than versioning itself :)

W dniu 04.07.2013 21:56, Hieu Hoang pisze:
does your binary files have version numbers embedded in them? I would highly recommend they do.

kenlm has it, it's even human readable by doing
   head -1
on any kenlm binary files. The decoder throws errors if running with incompatible version
If
On 4 July 2013 20:52, Marcin Junczys-Dowmunt <[email protected] <mailto:[email protected]>> wrote:

    I had a similar issue like that a few days ago with a quite old
    moses version, recompiling and rebuilding the phrase table seemed
    to solve it, so I did not investigate. However I am not quite
    sure what I actually did to fix it. Currently I am building the
    binary phrase table from the text version to compare. This will
    take a while, more fun tomorrow.

    W dniu 04.07.2013 21:46, Hieu Hoang pisze:
    it's a bit strange. Many words are unknown in the compact-pt
    version, eg. this 1 word sentence is unknown:
      un
    could it be encoding issues? or the wrong phrase table was
    binarized?

    On 4 July 2013 18:14, Hieu Hoang <[email protected]
    <mailto:[email protected]>> wrote:

        u can download my version
        http://statmt.org/~s0565741/download/alex/
        <http://statmt.org/%7Es0565741/download/alex/>
        I've also filtered the text phrase table so that it can run


        On 4 July 2013 17:47, Marcin Junczys-Dowmunt
        <[email protected] <mailto:[email protected]>> wrote:

            Hi Alexander,
            I am able to log in, but then it hangs infinitly while
            trying to retrieve the directory list.
            Best,
            Marcin

            W dniu 04.07.2013 16:59, Fishkov, Alexander pisze:

            Hi Hieu and Marcin!

            >> If either if you have a model (no matter how big)
            that reproduces the problem, that i can download, I
            look into it

            I have setup an ftp to share the model, so I send this
            message in private (not to the mailing list).

            ftp://hoang:[email protected]/
            <ftp://hoang:moses%[email protected]/>

            The folder structure is as follows:

            /lm – contains binary language model (just in case)

            /model.fr-en – contains translation model in text
            format with moses.ini file

            /compact-model.fr-en – contains compact model produced
            from the previous one with moses.ini

            P.S. I will be out of office until 16 of July.

            Best regards, Alexander.





-- Hieu Hoang
        Research Associate
        University of Edinburgh
        http://www.hoang.co.uk/hieu




-- Hieu Hoang
    Research Associate
    University of Edinburgh
    http://www.hoang.co.uk/hieu





--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to