Thanks to Marcin and Hieu for investigating this! I have noticed the part about alignment info but thought that it was the default behavior even for older versions. I will rebuild my translation models to make everything work fine.
From: [email protected] [mailto:[email protected]] On Behalf Of Hieu Hoang Sent: Friday, July 05, 2013 1:56 AM To: Marcin Junczys-Dowmunt Cc: Fishkov, Alexander; moses-support Subject: Re: [Moses-support] Compact phrase table produces different ranslations than original binary Thanks for looking at the problem. A test for alignment info is a good idea. I didn't think there's more verbose messages than before but I'll look at it again On 4 July 2013 22:15, Marcin Junczys-Dowmunt <[email protected]<mailto:[email protected]>> wrote: OK, I think the mystery is solved. The text version does not contain alignment information. The standard algorithm for the compact phrase table requires alignment information to work properly. If alignments are not present, you should use the "-encoding None -no-alignment-info" options (bigger, but still quite compact). It's even mentioned in the documentation, but I think I should add a test to the binarization tool, that croaks if alignment data is missing. A test with your phrase table and "-encoding None -no-alignment-info" works fine and produces the correct translation now. This also explains why the compression was so cruelly slow and the results is even smaller than the incorrectly built one. You wrote you used a version from July 2012 for training, with a recent moses version, this issue would not have arisen. Included alignment is now standard in the training scripts and then you can use the standard procedure for compact binarization, this should save some additional 30%. BTW: New moses is very verbose, is this on purpose? Best, Marcin W dniu 04.07.2013 22:01, Marcin Junczys-Dowmunt pisze: The binary format in the main branch actually never changed from the moment I released it. So it should not be an issue of binary incompatibility. I am planning to add version numbers with the first change in the binary This format other than versioning itself :) W dniu 04.07.2013 21:56, Hieu Hoang pisze: does your binary files have version numbers embedded in them? I would highly recommend they do. kenlm has it, it's even human readable by doing head -1 on any kenlm binary files. The decoder throws errors if running with incompatible version If On 4 July 2013 20:52, Marcin Junczys-Dowmunt <[email protected]<mailto:[email protected]>> wrote: I had a similar issue like that a few days ago with a quite old moses version, recompiling and rebuilding the phrase table seemed to solve it, so I did not investigate. However I am not quite sure what I actually did to fix it. Currently I am building the binary phrase table from the text version to compare. This will take a while, more fun tomorrow. W dniu 04.07.2013 21:46, Hieu Hoang pisze: it's a bit strange. Many words are unknown in the compact-pt version, eg. this 1 word sentence is unknown: un could it be encoding issues? or the wrong phrase table was binarized? On 4 July 2013 18:14, Hieu Hoang <[email protected]<mailto:[email protected]>> wrote: u can download my version http://statmt.org/~s0565741/download/alex/<http://statmt.org/%7Es0565741/download/alex/> I've also filtered the text phrase table so that it can run On 4 July 2013 17:47, Marcin Junczys-Dowmunt <[email protected]<mailto:[email protected]>> wrote: Hi Alexander, I am able to log in, but then it hangs infinitly while trying to retrieve the directory list. Best, Marcin W dniu 04.07.2013 16:59, Fishkov, Alexander pisze: Hi Hieu and Marcin! >> If either if you have a model (no matter how big) that reproduces the >> problem, that i can download, I look into it I have setup an ftp to share the model, so I send this message in private (not to the mailing list). ftp://hoang:[email protected]/<ftp://hoang:moses%[email protected]/> The folder structure is as follows: /lm - contains binary language model (just in case) /model.fr-en - contains translation model in text format with moses.ini file /compact-model.fr-en - contains compact model produced from the previous one with moses.ini P.S. I will be out of office until 16 of July. Best regards, Alexander. -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
