thanks Germán i know this is just warning, but the problem after this warning is conversion process stoped.
the size of the phrasetable is 350MB, whereas the binary-phrasetable is 170MB? > Date: Wed, 15 Sep 2010 08:44:40 +0200 > From: [email protected] > To: [email protected] > CC: [email protected] > Subject: Re: [Moses-support] phrasetable-binary > > Dear Musa, > > As the message itself says, that is not an error, but a warning, and in > fact it is quite common to get that warning, as far as I know. > > What that warning is actually saying is that there are source words in > your test set which do not appear to have a possible translation within > your phrase table, i.e. those source words in the test set will be > considered as unknown by the translation system. This problem is quite > common, specially if you are trying to translate a test set whose origin > is very different to the training data. > > I hope that explains the message. > > Best, > > Germán Sanchis-Trilles > > > > > On Wed, 15 Sep 2010, musa ghurab wrote: > > > hi > > I have problem when converting phrase-table.gz to hard disk binary image. i > > got the following error: > > > > [m...@ibb]# gzip -cd work/20100914/model/phrase-table.gz | LC_ALL=C sort | > > nlp/moses/misc/processPhraseTable -ttable 0 0 - -nscores 5 -out > > work/20100914/binary/model/phrase-table > > processing ptree for stdin > > ..................................................[phrase:500000] > > ..........................distinct source phrases: 762319 distinct first > > words of source phrases: 11727 number of phrase pairs (line count): 3639432 > > WARNING: there are src voc entries with no phrase translation: count 1156 > > There exists phrase translations for 10571 entries > > > > > > i checked the line by the following command, and it seems to be ok. > > > > > > [m...@ibb]# gzip -cd work/20100914/model/phrase-table.gz | sed -n > > '1150,1160p' > > > > > > Then i removed 10 lines from 1150-1160, and the problem still exist > > > > > > [m...@ibb]# LC_ALL=C sort | nlp/moses/misc/processPhraseTable -ttable 0 0 - > > -nscores 5 -out work/20100914/binary/model/phrase-table < > > work/20100914/model/phrase-table.cleaned > > processing ptree for stdin > > ..................................................[phrase:500000] > > ..........................distinct source phrases: 762319 distinct first > > words of source phrases: 11727 number of phrase pairs (line count): 3639423 > > WARNING: there are src voc entries with no phrase translation: count 1156 > > There exists phrase translations for 10571 entries > > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
