I confess, the files were in dos format the first time, but numberize_line (in train-factored-phrase-model.perl) couldn't cope with the ^Ms (reported them as unknown words). The data now uses unix line-endings (I ran dos2unix just to make sure but it made no changes to the files).
> ... whatever Windows tools you've used to manipulate the corpus > files ... I'm not doing and I don't plan to do any corpus manipulation on Windows. Once this is running properly the whole thing will have a python script wrapped round it so it can be used platform independent. Hieu Hoang wrote: > My guess is whatever Windows tools you've used to manipulate the corpus > files have added the windows new line character, 0x10 or 0x13. > > You should run > dos2unix on your > corpus files before rerunning > train-factored-phrase-model.perl > > > 2009/11/2 Ivan Uemlianin <[email protected] > <mailto:[email protected]>> > > Dear All > > I have Moses running fine on MacOSX. Now I am setting it up on Windows > using Cygwin. > > The current error I'm working on is that the file model/lex.f2e > occasionally has a space as its first field. Does anyone know how this > comes about and/or how I can fix it? > > Some details: > > I'm running the simple train-factored-phrase-model.perl scripts from the > step through page, like this: > > > cmd = nohup nice \ > /full/path/to/train-factored-phrase-model.perl \ > -scripts-root-dir \ > /full/path/to/scripts-20091102-1102 \ > -root-dir \ > /full/path/to/tf \ > -corpus /full/path/to/tf/corpus/projname.tok \ > -f cy \ > -e en \ > -alignment grow-diag-final-and \ > -reordering msd-bidirectional-fe \ > -lm 0:3:/full/path/to/tf/lm_irst/projname.en.irstlm.gz:1 > > > Everything seems to run OK --- I mean it doesn't crash or freeze --- but > the translator doesn't work. stderr from the script has the following > warnings: > > > Loading lexical translation table from > /home/ivan/moses_tools/factory/tf/model/lex.f2e > line 34 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong > number of tokens, skipping: > 2 gwyntoedd gwyntoedd 0.0087719 > line 83 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong > number of tokens, skipping: > 2 droi droi 0.4000000 > > > The relevant lines in lex.f2e have a space as their first token, as in: > > > the gwyntoedd 0.0225564 > gwyntoedd 0.0150376 > a gwyntoedd 0.0075188 > > > Any help would be much appreciated. Once it's all working I'll post > full guidance on getting Moses running under Cygwin. > > Best wishes > > Ivan > > > -- > ******************************** > Ivan Uemlianin > > Canolfan Bedwyr > Safle'r Normal Site > Prifysgol Bangor University > BANGOR > Gwynedd > LL57 2PZ > > [email protected] <mailto:[email protected]> > ******************************** > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- ******************************** Ivan Uemlianin Canolfan Bedwyr Safle'r Normal Site Prifysgol Bangor University BANGOR Gwynedd LL57 2PZ [email protected] ******************************** _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
