Hi Ivan,

I'm not sure if you have tried this.

In the script "train-factored-phrase-model.perl", you could look at the
method  "sub get_lexical", which is where the necessary counts to compute
lex.f2e and lex.e2f are accumulated.

With each for loop, where you see the updates:
$TOTAL_FOREIGN{$FOREIGN[$fi]}++;
and/or
$TOTAL_ENGLISH{$ENGLISH[$ei]}++;

In the method, you could try to check if $FOREIGN[$fi] or $ENGLISH[$ei]
contain only space characters, then print the line id $alignment_id.

Knowing the line id of such instances, you could then trace back by looking
at the 3 files model/aligned.0.srcLang ($alignment_file_f),
model/aligned.0.tgtLang ($alignment_file_e), and model/aligned.alignType
($alignment_file_a). (note: srcLang, tgtLang, and alignType should be
replaced according to your current system).

Hope that helps,

Cheers,
Thang

On Mon, Nov 2, 2009 at 11:41 PM, Ivan Uemlianin <[email protected]>wrote:

> Dear Thang
>
> Thank you for your comment.
>
> score.cpp, using tokenize in tables-core.cpp, tests whether each line of
> lex.f2e has the expected 3 tokens.  What would be more useful for me
> would be to know:
>
> - how is lex.f2e generated?
> - what is lex.f2e supposed to represent?
> - why is the first item sometimes omitted?
>
> Can you help on any of those?
>
> Best wishes
>
> Ivan
>
>
> Thang Luong Minh wrote:
> > Hi Ivan,
> >
> > You could fix it by looking at the file
> > src/moses/scripts/training/phrase-extract/score.cpp with method void
> > LexicalTable::load( char *fileName ) inside:
> >
> > vector<string> token = tokenize( line );
> > if (token.size() != 3) {
> >      cerr << "line " << i << " in " << fileName << " has wrong number of
> > tokens, skipping:\n" <<
> >      token.size() << " " << token[0] << " " << line << endl;
> >      continue;
> > }
> >
> > You could either modfiy the method tokenize, or relax the if condition.
> >
> > Hope that helps,
> >
> > Cheers.
> > Thang
> >
> > On Mon, Nov 2, 2009 at 8:29 PM, Ivan Uemlianin <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Dear All
> >
> >     I have Moses running fine on MacOSX.  Now I am setting it up on
> Windows
> >     using Cygwin.
> >
> >     The current error I'm working on is that the file model/lex.f2e
> >     occasionally has a space as its first field.  Does anyone know how
> this
> >     comes about and/or how I can fix it?
> >
> >     Some details:
> >
> >     I'm running the simple train-factored-phrase-model.perl scripts from
> the
> >     step through page, like this:
> >
> >
> >     cmd = nohup  nice    \
> >     /full/path/to/train-factored-phrase-model.perl  \
> >     -scripts-root-dir    \
> >       /full/path/to/scripts-20091102-1102           \
> >     -root-dir            \
> >       /full/path/to/tf   \
> >     -corpus /full/path/to/tf/corpus/projname.tok    \
> >     -f cy   \
> >     -e en   \
> >     -alignment grow-diag-final-and     \
> >     -reordering msd-bidirectional-fe   \
> >     -lm 0:3:/full/path/to/tf/lm_irst/projname.en.irstlm.gz:1
> >
> >
> >     Everything seems to run OK --- I mean it doesn't crash or freeze ---
> but
> >     the translator doesn't work.  stderr from the script has the
> following
> >     warnings:
> >
> >
> >     Loading lexical translation table from
> >     /home/ivan/moses_tools/factory/tf/model/lex.f2e
> >     line 34 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
> >     number of tokens, skipping:
> >     2 gwyntoedd  gwyntoedd 0.0087719
> >     line 83 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
> >     number of tokens, skipping:
> >     2 droi  droi 0.4000000
> >
> >
> >     The relevant lines in lex.f2e have a space as their first token, as
> in:
> >
> >
> >     the gwyntoedd 0.0225564
> >      gwyntoedd 0.0150376
> >     a gwyntoedd 0.0075188
> >
> >
> >     Any help would be much appreciated.  Once it's all working I'll post
> >     full guidance on getting Moses running under Cygwin.
> >
> >     Best wishes
> >
> >     Ivan
> >
> >
> >     --
> >     ********************************
> >     Ivan Uemlianin
> >
> >     Canolfan Bedwyr
> >     Safle'r Normal Site
> >     Prifysgol Bangor University
> >     BANGOR
> >     Gwynedd
> >     LL57 2PZ
> >
> >     [email protected] <mailto:[email protected]>
> >     ********************************
> >     _______________________________________________
> >     Moses-support mailing list
> >     [email protected] <mailto:[email protected]>
> >     http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > Luong Minh Thang
> > WING group, School of Computing, National University of Singapore
> > http://wing.comp.nus.edu.sg/~lmthang<http://wing.comp.nus.edu.sg/%7Elmthang>
>
>
> --
> ********************************
> Ivan Uemlianin
>
> Canolfan Bedwyr
> Safle'r Normal Site
> Prifysgol Bangor University
> BANGOR
> Gwynedd
> LL57 2PZ
>
> [email protected]
> ********************************
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
Luong Minh Thang
WING group, School of Computing, National University of Singapore
http://wing.comp.nus.edu.sg/~lmthang
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to