Philipp

Thanks for your comment.

Note that exactly the same data will work perfectly well with moses on 
MacOSX.

The corpus files use unix line ends.

The corpus data includes only 0-9, a-z, space (ascii 32), new line 
(ascii 10), and the following punctuation characters (all within ascii, 
no weird windows codes):

'
,
-
.
/
:

Are any of those ASCII characters too noisy?

Ivan


Philipp Koehn wrote:
> Hi,
> 
> this looks like that your training corpus has some noisy ASCII characters
> that are handled differently by C++ and Perl. You will need to clean up
> your corpus to remove them.
> 
> -phi
> 
> On Mon, Nov 2, 2009 at 12:29 PM, Ivan Uemlianin
> <[email protected]> wrote:
>> Dear All
>>
>> I have Moses running fine on MacOSX.  Now I am setting it up on Windows
>> using Cygwin.
>>
>> The current error I'm working on is that the file model/lex.f2e
>> occasionally has a space as its first field.  Does anyone know how this
>> comes about and/or how I can fix it?
>>
>> Some details:
>>
>> I'm running the simple train-factored-phrase-model.perl scripts from the
>> step through page, like this:
>>
>>
>> cmd = nohup  nice    \
>> /full/path/to/train-factored-phrase-model.perl  \
>> -scripts-root-dir    \
>>   /full/path/to/scripts-20091102-1102           \
>> -root-dir            \
>>   /full/path/to/tf   \
>> -corpus /full/path/to/tf/corpus/projname.tok    \
>> -f cy   \
>> -e en   \
>> -alignment grow-diag-final-and     \
>> -reordering msd-bidirectional-fe   \
>> -lm 0:3:/full/path/to/tf/lm_irst/projname.en.irstlm.gz:1
>>
>>
>> Everything seems to run OK --- I mean it doesn't crash or freeze --- but
>> the translator doesn't work.  stderr from the script has the following
>> warnings:
>>
>>
>> Loading lexical translation table from
>> /home/ivan/moses_tools/factory/tf/model/lex.f2e
>> line 34 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
>> number of tokens, skipping:
>> 2 gwyntoedd  gwyntoedd 0.0087719
>> line 83 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
>> number of tokens, skipping:
>> 2 droi  droi 0.4000000
>>
>>
>> The relevant lines in lex.f2e have a space as their first token, as in:
>>
>>
>> the gwyntoedd 0.0225564
>>  gwyntoedd 0.0150376
>> a gwyntoedd 0.0075188
>>
>>
>> Any help would be much appreciated.  Once it's all working I'll post
>> full guidance on getting Moses running under Cygwin.
>>
>> Best wishes
>>
>> Ivan
>>
>>
>> --
>> ********************************
>> Ivan Uemlianin
>>
>> Canolfan Bedwyr
>> Safle'r Normal Site
>> Prifysgol Bangor University
>> BANGOR
>> Gwynedd
>> LL57 2PZ
>>
>> [email protected]
>> ********************************
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>


-- 
********************************
Ivan Uemlianin

Canolfan Bedwyr
Safle'r Normal Site
Prifysgol Bangor University
BANGOR
Gwynedd
LL57 2PZ

[email protected]
********************************
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to