Re: [Moses-support] Errors training GIZA++

nakul sharma Mon, 31 Jan 2011 20:22:00 -0800

Hi Barry,
./clean-corpus-n.perl in truck/scripts/training returned following error:-


./clean-corpus-n.perl corpus/* txt txt clean 1 50
clean-corpus.perl: processing
corpus/200EnglishSens.txt.corpus/200HindiSens.txt & .txt to txt, cutoff
clean-1
Use of uninitialized value $opn in open at ./clean-corpus-n.perl line 46.
Use of uninitialized value $opn in concatenation (.) or string at
./clean-corpus-n.perl line 46.
Can't open '' at ./clean-corpus-n.perl line 46.

using train-factored-phrase-model.perl returned following  error:-

Using SCRIPTS_ROOTDIR: /home/nakul/mosesdecoder/trunk/scripts
Using single-thread GIZA
ERROR: Cannot find mkcls, GIZA++, & snt2cooc.out in .
Did you install this script using 'make release'? at
./train-factored-phrase-model.perl line 205.

it seems that moses does not recognize GIZA++ and mkcls. they are installed
in different directories. i want to train them separately. is it possible to
do so ? Regarding vcb file i got it by executing following command :-

sudo ./plain2snt.out 200ESens.txt 200HSens.txt

creates en.vcb, hn.vcb and bit text files (200ESens_200HSens.snt,
200HSens_200ESens.snt) in GIZA++ format.

--
Thanks & Regards
nakul.




On Mon, Jan 31, 2011 at 3:54 PM, Barry Haddow <[email protected]> wrote:

> Hi Nakul
>
> Clean corpus will get rid of long lines and lines with a high length ratio,
> which giza doesn't like.  This could fix your first error.
>
> Run ./clean-corpus-n,perl --help for usage instructions.
>
> As to the second error, if you're not using the moses scripts, how did you
> create the vcb files? It looks as though they don't match the corpus,
>
> best regards - Barry
>
> On Monday 31 January 2011 10:17, nakul sharma wrote:
> > Hi Barry,
> >
> > i am not training giza through moses. i am training it independently.
> Will
> > it make any difference ? Anyways i  do not have clean-corpus-n.perl in
> > giza. please tell what to do of it ?
> >
> > On Mon, Jan 31, 2011 at 3:07 PM, Barry Haddow <[email protected]>
> wrote:
> > > Hi Nakul
> > >
> > > Did you clean your corpus first (ie run clean-corpus-n.perl over it) ?
> > >
> > > best regards - Barry
> > >
> > > On Monday 31 January 2011 04:20, nakul sharma wrote:
> > > > hi all,
> > > >
> > > > i have having g++ version 4.4.3 and ubuntu 10.04 LTS, while training
> > > > GIZA++, i get following error upon execution of GIZA++ exe file:-
> > > >
> > > > Reading vocabulary file from:200ESens.vcb
> > > > Reading vocabulary file from:200HSens.vcb
> > > > {WARNING:(a)truncated sentence 0}{WARNING:(a)truncated sentence
> > >
> > > 1}WARNING:
> > > > The following sentence pair has source/target sentence length ration
> > > > more than the maximum allowed limit for a source word fertility
> > > >  source length = 1 target length = 11 ratio 11 ferility limit : 9
> > > > Shortening sentence
> > > > Sent No: 3 , No. Occurrences: 1
> > > > 0 254
> > > > 57 5 3 58 59 60 5 61 62 63 64
> > > >
> > > > like this for almost all the Sent No, i get this warning and then for
> a
> > > > sentence number 98 i get this error message:-
> > > >
> > > > Sent No: 98 , No. Occurrences: 1
> > > > 0 457 458
> > > > 909 910 15 911 17 86 912 913 65 3 914 915 22 916 11 917 170 162 918
> 919
> > > > 3 684 22 8 920 921 22 8 333 922 923 924 22 925
> > > > ERROR: target word 937 is not in the vocabulary list.
> > > >
> > > > Giza++ has generated only one file **.root.gfcs.
> > > >
> > > > Please tell how to deal with this problem.
> > >
> > > --
> > > The University of Edinburgh is a charitable body, registered in
> > > Scotland, with registration number SC005336.
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


--

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Errors training GIZA++

Reply via email to