I can now confirm that steps 1 - 7 of the training process work in Cygwin if the
user edits the $SCRIPTS_ROOT_DIR variable to a relative path (from the present
working directory of the user, not of the script). I cannot comment on stage 8
because there is no documentation on how to request a generation model and so
any attempts to test the step result in the step being skipped. 

The most important changes I have identified are:

1) The need to use relative paths to make the system calls work
2) The need to create a /dev/stdin file for step 3 compatibility
3) The need to make two separate system calls in step 3 in place of the original
1

I haven't come up with a portable solution to 1) yet. I suppose the best we can
do until we come up with a portable solution is to document the fact that the
user should edit the $SCRIPTS_ROOT_DIR variable to a relative path from the
intended present working directory.

For 3) here is the change I made to the word_align function:

 safesystem("$GIZA2BAL -d $__ALIGNMENT_INV_CMD -i $__ALIGNMENT_CMD >
alignment.tmp") 
      ||
        die "Can't generate symmetrized alignment file\n";

    safesystem("$SYMAL -alignment=\"$__symal_a\" -diagonal=\"$__symal_d\" ".
           "-final=\"$__symal_f\" -both=\"$__symal_b\"  -i=alignment.tmp -o=".
           "$___ALIGNMENT_FILE.$___ALIGNMENT") 
      ||
        die "Can't generate symmetrized alignment file\n" 

in place of the one liner:

 safesystem("$GIZA2BAL -d $__ALIGNMENT_INV_CMD -i $__ALIGNMENT_CMD |". 
           "$SYMAL -alignment=\"$__symal_a\" -diagonal=\"$__symal_d\" ".
           "-final=\"$__symal_f\" -both=\"$__symal_b\" >".
           "$___ALIGNMENT_FILE.$___ALIGNMENT") 
      ||
        die "Can't generate symmetrized alignment file\n"

Of course, this change only works if the user has changed the $SCRIPTS_ROOT_DIR
variable by hand as the $GIZBAL and the $SYMAL variables are created using the
$SCRIPTS_ROOT_DIR variable.

This is the best I can offer at the moment, but it will get the user to stage 7
as long as both changes 1) and 2) have been implemented.

Note that the above suggested change could be improved. System calls could be
used to 'mkdir -p /dev' and 'touch /dev/stdin' before calling giz2bal.pl and
symal.exe and to 'rm -f alignment.tmp' after calling symal.exe just to clean
up.

I haven't tested jhndrsn's method of dealing with the /dev/stdin problem yet but
I'm not sure how we could implement it in a portable manner (i.e. without
breaking linux installations).

The changes I am suggesting could be merged into a WIN32 only version of the
train-factored-phrase-model.perl script. Would you be willing to consider a
WIN32 version of the script or do you want changes to be portable? If so, I
guess we could make a --cygwin flag and add if () else () logic to the script
to handle things differently under a cygwin environment.

Please, let me know we which of the two evils you would be willing to consider:

1) A separate WIN32 train-factored-phrase-model.perl script

or

2) A --cygwin flag to the script which allows for different logic under cygwin

I suppose we could consider a third possibility as well - automatic cygwin
detection using the `uname` command.

Let me know your thoughts and I'll see if I can get a workable patch together.

  

Quoting Hieu Hoang <[EMAIL PROTECTED]>:

> Thanks john. 
> 
> Glad to see you guys hacking away with the cygwin version. 
> 
> You're right in that it takes too long to tune, except for small datasets. 
> 
> I think its good to have a cygwin version because there's some people, like
> JC, who are constrained on the machine they can use. And its nice to have it
> working under windows for the development environment, and when u're using
> your laptop etc.
> 
> Not sure how to get your patch into the production. Can merge it and test it
> on Linux and cygwin but will be a bit of a nightmare. 
> 
> Unlike the decoder development, there's no regression test or rollout
> procedure for the training scripts, everything is a bit hotch potch. So
> can't be sure the changes work until someone screams.
> 
> I think it too far down the line now. 
> 
> JC - if u can get something working, let me know so we can commit the change
> for prosperity
> 
> -----Original Message-----
> From: John Henderson [mailto:[EMAIL PROTECTED] 
> Sent: 18 February 2008 20:49
> To: J C Read
> Cc: Hieu Hoang
> Subject: Re: [Moses-support] Cygwin: step 4 and step 5
> 
> Folks -
> 
> I'm attaching the "svn diff" output on my cygwin box.  Note that it's versus
> version 1430 and has lots of little differences in addition to the big ones.
> You can probably get through them pretty quickly to find the relevant ones.
> 
> Wish I had more time to clean this up for you, but I hope you find it
> useful.
> 
> -John
> 
> 


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to