This is as far as I've got. If possible it would be appreciated to move conversation over to the Jira ticket. https://issues.apache.org/jira/browse/JOSHUA-304 Lewis
On Tue, Aug 23, 2016 at 8:58 PM, lewis john mcgibbney <lewi...@apache.org> wrote: > Hi dev@, > I ran into a bit of bother whilst attempting to complete the example at > [0]. > Joshua master is installed correctly. > The problem I am having is almost exactly described at [1] > > I attempt to build the model using the following parameters > > $JOSHUA/bin/pipeline.pl --type hiero --rundir 1 --readme "Baseline Hiero > run" --source es --target en --witten-bell --corpus > $SPANISH/corpus/asr/callhome_train --corpus $SPANISH/corpus/asr/fisher_train > --tune $SPANISH/corpus/asr/fisher_dev --test > $SPANISH/corpus/asr/callhome_devtest > --lm-order 3 > > It seems that the initial aspects of the pipeline run and complete well > with the following output > > [source-numlines] retrieved cached result => 151810 > > However when the pipeline progresses to alignment with GIZA, the generated > log indicates some fatal error which I am not familiarized with [1]. I've > never seen it. > As you can see there are many many sentence mismatch errors within a > final alignment phase with the following log output > > ERROR: Can't generate symmetrized alignment file > > I then tried to change the aligner to berekelylm as suggested in [1] and > also based upon some advice given by Matt in a more recent thread. As > follows > > $JOSHUA/bin/pipeline.pl --type hiero --rundir 3 --readme "Baseline Hiero > run 3" --source es --target en --lm-gen berkeleylm --lm berkeleylm > --aligner berkeley --corpus $SPANISH/corpus/asr/callhome_train --corpus > $SPANISH/corpus/asr/fisher_train --tune $SPANISH/corpus/asr/fisher_dev > --test $SPANISH/corpus/asr/callhome_devtest --lm-order 3 > > However this results in the following output within the early aspects of > the pipeline > > [source-numlines] retrieved cached result => 151810 > [berkeley-aligner-chunk-0] rebuilding... > dep=alignments/0/word-align.conf [CHANGED] > dep=/usr/local/incubator-joshua/experiments/fisher_ > callhome_experiment/4/data/train/splits/corpus.es.0 [CHANGED] > dep=/usr/local/incubator-joshua/experiments/fisher_ > callhome_experiment/4/data/train/splits/corpus.en.0 [CHANGED] > dep=alignments/0/training.align [NOT FOUND] > cmd=java -d64 -Xmx10g -jar > /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++alignments/0/word-align.conf > JOB FAILED (return code 1) > [aligner-combine] rebuilding... > dep=alignments/0/training.align [NOT FOUND] > dep=alignments/training.align [NOT FOUND] > cmd=cat alignments/0/training.align > alignments/training.align > JOB FAILED (return code 1) > cat: alignments/0/training.align: No such file or directory > > It turns out of course that the '++alignments/0/word-align.conf' is not > present. So I am looking for that bug in the codebase right now and will > try to submit a PR. > > Lewis > > [0] https://github.com/apache/incubator-joshua/tree/master/ > examples#building-a-spanish----english-translation-model- > using-the-fisher-spanish-callhome-corpus > [1] https://groups.google.com/forum/#!topic/joshua_support/CvNjIRboixc > [2] https://paste.apache.org/wjm9 > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney > -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney