Hi Ivan,
I'm not sure if you have tried this.
In the script "train-factored-phrase-model.perl", you could look at the
method "sub get_lexical", which is where the necessary counts to compute
lex.f2e and lex.e2f are accumulated.
With each for loop, where you see the updates:
$TOTAL_FOREIGN{$FOREIGN[$fi]}++;
and/or
$TOTAL_ENGLISH{$ENGLISH[$ei]}++;
In the method, you could try to check if $FOREIGN[$fi] or $ENGLISH[$ei]
contain only space characters, then print the line id $alignment_id.
Knowing the line id of such instances, you could then trace back by looking
at the 3 files model/aligned.0.srcLang ($alignment_file_f),
model/aligned.0.tgtLang ($alignment_file_e), and model/aligned.alignType
($alignment_file_a). (note: srcLang, tgtLang, and alignType should be
replaced according to your current system).
Hope that helps,
Cheers,
Thang
On Mon, Nov 2, 2009 at 11:41 PM, Ivan Uemlianin <[email protected]>wrote:
> Dear Thang
>
> Thank you for your comment.
>
> score.cpp, using tokenize in tables-core.cpp, tests whether each line of
> lex.f2e has the expected 3 tokens. What would be more useful for me
> would be to know:
>
> - how is lex.f2e generated?
> - what is lex.f2e supposed to represent?
> - why is the first item sometimes omitted?
>
> Can you help on any of those?
>
> Best wishes
>
> Ivan
>
>
> Thang Luong Minh wrote:
> > Hi Ivan,
> >
> > You could fix it by looking at the file
> > src/moses/scripts/training/phrase-extract/score.cpp with method void
> > LexicalTable::load( char *fileName ) inside:
> >
> > vector<string> token = tokenize( line );
> > if (token.size() != 3) {
> > cerr << "line " << i << " in " << fileName << " has wrong number of
> > tokens, skipping:\n" <<
> > token.size() << " " << token[0] << " " << line << endl;
> > continue;
> > }
> >
> > You could either modfiy the method tokenize, or relax the if condition.
> >
> > Hope that helps,
> >
> > Cheers.
> > Thang
> >
> > On Mon, Nov 2, 2009 at 8:29 PM, Ivan Uemlianin <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Dear All
> >
> > I have Moses running fine on MacOSX. Now I am setting it up on
> Windows
> > using Cygwin.
> >
> > The current error I'm working on is that the file model/lex.f2e
> > occasionally has a space as its first field. Does anyone know how
> this
> > comes about and/or how I can fix it?
> >
> > Some details:
> >
> > I'm running the simple train-factored-phrase-model.perl scripts from
> the
> > step through page, like this:
> >
> >
> > cmd = nohup nice \
> > /full/path/to/train-factored-phrase-model.perl \
> > -scripts-root-dir \
> > /full/path/to/scripts-20091102-1102 \
> > -root-dir \
> > /full/path/to/tf \
> > -corpus /full/path/to/tf/corpus/projname.tok \
> > -f cy \
> > -e en \
> > -alignment grow-diag-final-and \
> > -reordering msd-bidirectional-fe \
> > -lm 0:3:/full/path/to/tf/lm_irst/projname.en.irstlm.gz:1
> >
> >
> > Everything seems to run OK --- I mean it doesn't crash or freeze ---
> but
> > the translator doesn't work. stderr from the script has the
> following
> > warnings:
> >
> >
> > Loading lexical translation table from
> > /home/ivan/moses_tools/factory/tf/model/lex.f2e
> > line 34 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
> > number of tokens, skipping:
> > 2 gwyntoedd gwyntoedd 0.0087719
> > line 83 in /home/ivan/moses_tools/factory/tf/model/lex.f2e has wrong
> > number of tokens, skipping:
> > 2 droi droi 0.4000000
> >
> >
> > The relevant lines in lex.f2e have a space as their first token, as
> in:
> >
> >
> > the gwyntoedd 0.0225564
> > gwyntoedd 0.0150376
> > a gwyntoedd 0.0075188
> >
> >
> > Any help would be much appreciated. Once it's all working I'll post
> > full guidance on getting Moses running under Cygwin.
> >
> > Best wishes
> >
> > Ivan
> >
> >
> > --
> > ********************************
> > Ivan Uemlianin
> >
> > Canolfan Bedwyr
> > Safle'r Normal Site
> > Prifysgol Bangor University
> > BANGOR
> > Gwynedd
> > LL57 2PZ
> >
> > [email protected] <mailto:[email protected]>
> > ********************************
> > _______________________________________________
> > Moses-support mailing list
> > [email protected] <mailto:[email protected]>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > Luong Minh Thang
> > WING group, School of Computing, National University of Singapore
> > http://wing.comp.nus.edu.sg/~lmthang<http://wing.comp.nus.edu.sg/%7Elmthang>
>
>
> --
> ********************************
> Ivan Uemlianin
>
> Canolfan Bedwyr
> Safle'r Normal Site
> Prifysgol Bangor University
> BANGOR
> Gwynedd
> LL57 2PZ
>
> [email protected]
> ********************************
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Luong Minh Thang
WING group, School of Computing, National University of Singapore
http://wing.comp.nus.edu.sg/~lmthang
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support