I confess, the files were in dos format the first time, but 
numberize_line (in train-factored-phrase-model.perl) couldn't cope with 
the ^Ms (reported them as unknown words).  The data now uses unix 
line-endings (I ran dos2unix just to make sure but it made no changes to 
the files).

 > ... whatever Windows tools you've used to manipulate the corpus
 > files ...

I'm not doing and I don't plan to do any corpus manipulation on Windows. 
  Once this is running properly the whole thing will have a python 
script wrapped round it so it can be used platform independent.

Hieu Hoang wrote:
> My guess is whatever Windows tools you've used to manipulate the corpus 
> files have added the windows new line character, 0x10 or 0x13.
> 
> You should run
>   dos2unix on your
> corpus files before rerunning 
>   train-factored-phrase-model.perl
> 
> 
> 2009/11/2 Ivan Uemlianin <[email protected] 
> <mailto:[email protected]>>
> 
>     Dear All
> 
>     I have Moses running fine on MacOSX.  Now I am setting it up on Windows
>     using Cygwin.
> 
>     The current error I'm working on is that the file model/lex.f2e
>     occasionally has a space as its first field.  Does anyone know how this
>     comes about and/or how I can fix it?
> 
>     Some details:
> 
>     I'm running the simple train-factored-phrase-model.perl scripts from the
>     step through page, like this:
> 
> 
>     cmd = nohup  nice    \
>     /full/path/to/train-factored-phrase-model.perl  \
>     -scripts-root-dir    \
>       /full/path/to/scripts-20091102-1102           \
>     -root-dir            \
>       /full/path/to/tf   \
>     -corpus /full/path/to/tf/corpus/projname.tok    \
>     -f cy   \
>     -e en   \
>     -alignment grow-diag-final-and     \
>     -reordering msd-bidirectional-fe   \
>     -lm 0:3:/full/path/to/tf/lm_irst/projname.en.irstlm.gz:1
> 
> 
>     Everything seems to run OK --- I mean it doesn't crash or freeze --- but
>     the translator doesn't work.  stderr from the script has the following
>     warnings:
> 
> 
>     Loading lexical translation table from
>     /home/ivan/moses_tools/factory/tf/model/lex.f2e
>     line 34 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
>     number of tokens, skipping:
>     2 gwyntoedd  gwyntoedd 0.0087719
>     line 83 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
>     number of tokens, skipping:
>     2 droi  droi 0.4000000
> 
> 
>     The relevant lines in lex.f2e have a space as their first token, as in:
> 
> 
>     the gwyntoedd 0.0225564
>      gwyntoedd 0.0150376
>     a gwyntoedd 0.0075188
> 
> 
>     Any help would be much appreciated.  Once it's all working I'll post
>     full guidance on getting Moses running under Cygwin.
> 
>     Best wishes
> 
>     Ivan
> 
> 
>     --
>     ********************************
>     Ivan Uemlianin
> 
>     Canolfan Bedwyr
>     Safle'r Normal Site
>     Prifysgol Bangor University
>     BANGOR
>     Gwynedd
>     LL57 2PZ
> 
>     [email protected] <mailto:[email protected]>
>     ********************************
>     _______________________________________________
>     Moses-support mailing list
>     [email protected] <mailto:[email protected]>
>     http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 


-- 
********************************
Ivan Uemlianin

Canolfan Bedwyr
Safle'r Normal Site
Prifysgol Bangor University
BANGOR
Gwynedd
LL57 2PZ

[email protected]
********************************
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to