Thanks to Marcin and Hieu for investigating this!
I have noticed the part about alignment info but thought that it was the 
default behavior even for older versions. I will rebuild my translation models 
to make everything work fine.

From: [email protected] [mailto:[email protected]] On Behalf Of Hieu Hoang
Sent: Friday, July 05, 2013 1:56 AM
To: Marcin Junczys-Dowmunt
Cc: Fishkov, Alexander; moses-support
Subject: Re: [Moses-support] Compact phrase table produces different 
ranslations than original binary

Thanks for looking at the problem. A test for alignment info is a good idea.

I didn't think there's more verbose messages than before but I'll look at it 
again
On 4 July 2013 22:15, Marcin Junczys-Dowmunt 
<[email protected]<mailto:[email protected]>> wrote:
OK, I think the mystery is solved. The text version does not contain alignment 
information. The standard algorithm for the compact phrase table requires 
alignment information to work properly.
If alignments are not present, you should use the "-encoding None 
-no-alignment-info" options (bigger, but still quite compact).  It's even 
mentioned in the documentation, but I think I should add a test to the 
binarization tool, that croaks if alignment data is missing. A test with your 
phrase table and "-encoding None -no-alignment-info" works fine and produces 
the correct translation now. This also explains why the compression was so 
cruelly slow and the results is even smaller than the incorrectly built one. 
You wrote you used a version from July 2012 for training, with a recent moses 
version, this issue would not have arisen. Included alignment is now standard 
in the training scripts and then you can use the standard procedure for compact 
binarization, this should save some additional 30%.

BTW: New moses is very verbose, is this on purpose?
Best,
Marcin

W dniu 04.07.2013 22:01, Marcin Junczys-Dowmunt pisze:
The binary format in the main branch actually never changed from the moment I 
released it. So it should not be an issue of binary incompatibility. I am 
planning to add version numbers with the first change in the binary This format 
other than versioning itself :)

W dniu 04.07.2013 21:56, Hieu Hoang pisze:
does your binary files have version numbers embedded in them? I would highly 
recommend they do.

kenlm has it, it's even human readable by doing
   head -1
on any kenlm binary files. The decoder throws errors if running with 
incompatible version
If
On 4 July 2013 20:52, Marcin Junczys-Dowmunt 
<[email protected]<mailto:[email protected]>> wrote:
I had a similar issue like that a few days ago with a quite old moses version, 
recompiling and rebuilding the phrase table seemed to solve it, so I did not 
investigate. However I am not quite sure what I actually did to fix it. 
Currently I am building the binary phrase table from the text version to 
compare. This will take a while, more fun tomorrow.

W dniu 04.07.2013 21:46, Hieu Hoang pisze:
it's a bit strange. Many words are unknown in the compact-pt version, eg. this 
1 word sentence is unknown:
  un
could it be encoding issues? or the wrong phrase table was binarized?
On 4 July 2013 18:14, Hieu Hoang 
<[email protected]<mailto:[email protected]>> wrote:
u can download my version
   
http://statmt.org/~s0565741/download/alex/<http://statmt.org/%7Es0565741/download/alex/>
I've also filtered the text phrase table so that it can run

On 4 July 2013 17:47, Marcin Junczys-Dowmunt 
<[email protected]<mailto:[email protected]>> wrote:
Hi Alexander,
I am able to log in, but then it hangs infinitly while trying to retrieve the 
directory list.
Best,
Marcin

W dniu 04.07.2013 16:59, Fishkov, Alexander pisze:
Hi Hieu and Marcin!

>> If either if you have a model (no matter how big) that reproduces the 
>> problem, that i can download, I look into it
I have setup an ftp to share the model, so I send this message in private (not 
to the mailing list).

ftp://hoang:[email protected]/<ftp://hoang:moses%[email protected]/>

The folder structure is as follows:
/lm - contains binary language model (just in case)
/model.fr-en - contains translation model in text format with moses.ini file
/compact-model.fr-en - contains compact model produced from the previous one 
with moses.ini

P.S. I will be out of office until 16 of July.

Best regards, Alexander.




--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu




--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu





--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to