Hi Tom,
We don't really keep documentation on dependencies. We just try not to
add dependencies until it's really needed. I only know the usual suspects:
boost
perl
python
gcc
And lots of optional libraries eg. irstlm, srilm, tcmalloc...
i don't know the exact versions of each. It's subject to change anyway,
depending on added functionality and how much people complain.
You're probably in a better position to know about the exact
dependencies since you have customers bending your ear about them.
On 25/12/2012 01:52, Tom Hoar wrote:
Merry Christmas everyone.
Thanks, Hieu. No, your suggestion is not a problem. Documenting the
limitation and trapping the front-end is a viable solution.
We found the problem when a customer reported our code improperly
handled ASCII vs UTF-8 with European accented characters. I told the
staff to test our fixes with a worst-case scenario. They chose Thai
paths. Nice, huh? Since then, we fell back to "easier" European
accented characters, Chinese and Japanese. All of the non-Thai
characters seem to work fine. We can only replicate the error with
Thai. So, this seems to be a bug in Perl and its handling of Thai
characters with the system() call.
This troubleshooting exercise reveals some additional challenges that
we shared with our MS Windows team. Right now, that team is
documenting the dependencies in train-model.perl. Can you or your team
share any documentation of the dependencies?
Thanks,
Tom
On 2012-12-25 06:23, Hieu Hoang wrote:
hi tom
in an ideal world, non-ascii characters (and spaces and misc other
characters) won't be a problem. Unfortunately, the scripts aren't
tested very often for those cases and it's too difficult to enforce
scripts to work for anything but ascii paths. Especially as it's
spread over Moses and Mgiza scripts.
you're probably better off constraining your user front-end likewise.
Is that a problem for you?
merry xmas
hieu
On 24/12/2012 09:44, Tom Hoar wrote:
I've traced a problem in train-model.perl but don't know how to fix
it. I'm using Moses 0.91 and the error occurs when the calling
merge_alignment.py.
Line 1988, system(@_);, fails when the output path contains some
extended (Thai) UTF-8 characters.
The log output shows:
Executing: /home/tahoar/bin/merge_alignment.py
/home/tahoar/share/domy/TRAININGS/alignments/align-?????_tm-??????
-???/giza.??????-???/??????-???.A3.final.part*> /home/tahoar
/share/domy/TRAININGS/alignments/align-?????_tm-??????-???/giza.
??????-???/??????-???.A3.final
sh: cannot create
/home/tahoar/share/domy/TRAININGS/alignments/align-?????_tm-???
????-???/giza.???????-???/???????-???.A3.final: Directory nonexistent
Contrary to the log error message, the correct output directory
exists. Three things to note:
1) The corrupted UTF-8 characters above are in the log echoed to the
terminal, they're not a bad email
2) I can run the "Executing: xxx" line from the terminal and it
works fine
3) I patched merge_alignment.py to save the sys.argv list to a text
file just after the test for command arguments. The file never gets
created. So, merge_alignment.py is never executed with the Perl
"system" call.
I attached two proposed changes that I used to resolve the problem.
I updated merge_alignment.py so the first argument is the output
file name and all remaining arguments are input files. The new
merge_alignment.py uses glob to support wildcards in the input file
names, and it sends output to the file instead of stdout. The second
change is train-model.perl to match the command line changes to
merge_alignment.py.
Unfortunately, this only fixes the system call to merge_alignment.py
call. There are many other system calls that redirect the output,
and each of them show the same problem of corrupting the UTF-8
output path.
Any suggestions?
Tom
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support