Hi dev@,
I ran into a bit of bother whilst attempting to complete the example at [0].
Joshua master is installed correctly.
The problem I am having is almost exactly described at [1]

I attempt to build the model using the following parameters

$JOSHUA/bin/pipeline.pl --type hiero --rundir 1 --readme "Baseline Hiero
run" --source es --target en --witten-bell --corpus
$SPANISH/corpus/asr/callhome_train --corpus
$SPANISH/corpus/asr/fisher_train --tune  $SPANISH/corpus/asr/fisher_dev
--test  $SPANISH/corpus/asr/callhome_devtest --lm-order 3

It seems that the initial aspects of the pipeline run and complete well
with the following output

[source-numlines] retrieved cached result =>   151810

However when the pipeline progresses to alignment with GIZA, the generated
log indicates some fatal error which I am not familiarized with [1]. I've
never seen it.
As you can see there are many many sentence mismatch errors within a final
alignment phase with the following log output

ERROR: Can't generate symmetrized alignment file

I then tried to change the aligner to berekelylm as suggested in [1] and
also based upon some advice given by Matt in a more recent thread. As
follows

$JOSHUA/bin/pipeline.pl --type hiero --rundir 3 --readme "Baseline Hiero
run 3" --source es --target en --lm-gen berkeleylm --lm berkeleylm
--aligner berkeley --corpus $SPANISH/corpus/asr/callhome_train --corpus
$SPANISH/corpus/asr/fisher_train --tune  $SPANISH/corpus/asr/fisher_dev
--test  $SPANISH/corpus/asr/callhome_devtest --lm-order 3

However this results in the following output within the early aspects of
the pipeline

[source-numlines] retrieved cached result =>   151810
[berkeley-aligner-chunk-0] rebuilding...
  dep=alignments/0/word-align.conf [CHANGED]

dep=/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/4/data/train/splits/corpus.es.0
[CHANGED]

dep=/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/4/data/train/splits/corpus.en.0
[CHANGED]
  dep=alignments/0/training.align [NOT FOUND]
  cmd=java -d64 -Xmx10g -jar
/usr/local/incubator-joshua/lib/berkeleyaligner.jar
++alignments/0/word-align.conf
  JOB FAILED (return code 1)
[aligner-combine] rebuilding...
  dep=alignments/0/training.align [NOT FOUND]
  dep=alignments/training.align [NOT FOUND]
  cmd=cat alignments/0/training.align > alignments/training.align
  JOB FAILED (return code 1)
cat: alignments/0/training.align: No such file or directory

It turns out of course that the '++alignments/0/word-align.conf' is not
present. So I am looking for that bug in the codebase right now and will
try to submit a PR.

Lewis

[0]
https://github.com/apache/incubator-joshua/tree/master/examples#building-a-spanish----english-translation-model-using-the-fisher-spanish-callhome-corpus
[1] https://groups.google.com/forum/#!topic/joshua_support/CvNjIRboixc
[2] https://paste.apache.org/wjm9

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Reply via email to