Hi Folks, When attempting to build a heiro model using 5K sentences for tuning, many many more than that for testing and again many many more than that for the actual corpus (~880K) I get the following error within the GIZA alignment pipeline phase.
Anyone have a clue what this means? I have the full GIZA logs if they are useful. I did find a thread on a VERY similar issue at [0]. The solution seems to be to use absolute paths to all input data for the pipeline however that is exactly what I've done e.g. $JOSHUA/bin/pipeline.pl --rundir . --type hiero --corpus /usr/local/joshua_input/commoncrawl.ru-en --tune /usr/local/joshua_input/commoncrawl.ru-en.tune --test /usr/local/joshua_input/commoncrawl.ru-en.test --source en --target ru --rundir experiment1/1 --readme “Experiment 1 Run 1 Hiero Russian to English Translation model” --mbr Where the parallel .en and .ru sentence files exist for all of the above corpus, tune and test paths respectively. [0] http://comments.gmane.org/gmane.comp.nlp.moses.user/10489 I have been having trouble consistently when generating models using GIZA... is there a suggested alignment substitute which I should be trying out? One last question... roughly how long should a Hiero-based LM for a corpus of ~880K sentences take on say a MacBook Pro 2.7GHz Interl Core i7 16GB mem. I remeber reading a while ago on the old Joshua site that a pipeline would run in 10 or so minutes... this is clearly not the case and I would like to share/compare some results if possible with others who are in the business of generating LM and language packs. Thanks ========================================================== Executing: bash -c rm -f alignments/0/giza.ru.0-en.0/ru.0-en.0.A3.final.gz Executing: bash -c gzip alignments/0/giza.ru.0-en.0/ru.0-en.0.A3.final Waiting for second GIZA process... (3) generate word alignment @ Fri Jul 15 16:38:42 PDT 2016 Combining forward and inverted alignment from files: alignments/0/giza.en.0-ru.0/en.0-ru.0.A3.final.{bz2,gz} alignments/0/giza.ru.0-en.0/ru.0-en.0.A3.final.{bz2,gz} Executing: bash -c mkdir -p alignments/0/model Executing: bash -c /usr/local/incubator-joshua/ext/symal/giza2bal.pl -d <(gzip -cd alignments/0/giza.ru.0-en.0/ru.0-en.0.A3.final.gz) -i <(gzip -cd alignments/0/giza.en.0-ru.0/en.0-ru.0.A3.final.gz) |/usr/local/incubator-joshua/ext/symal/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" -o=alignments/0/model/aligned.grow-diag-final symal: computing grow alignment: diagonal (1) final (1)both-uncovered (0) skip=<0> counts=<817962> symal(9081,0x7fff76241310) malloc: *** error for object 0x7fff74472250: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug bash: line 1: 9080 Done /usr/local/incubator-joshua/ext/symal/giza2bal.pl -d <(gzip -cd alignments/0/giza.ru.0-en.0/ru.0-en.0.A3.final.gz) -i <(gzip -cd alignments/0/giza.en.0-ru.0/en.0-ru.0.A3.final.gz) 9081 Abort trap: 6 | /usr/local/incubator-joshua/ext/symal/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" -o=alignments/0/model/aligned.grow-diag-final Exit code: 134 ERROR: Can't generate symmetrized alignment file -- *Lewis*