Hi Folks, Follow up. It seems that when I clean the .cachepipe as well as all of the existing alignments, etc from the previous run and re-run the entire pipeline then this issue disappears. I have no real reason why this happened. All i can say is that it is of course best to run experiments in different directories when you make a tweak to a pipeline. Lewis
On Thu, Oct 20, 2016 at 12:20 AM, lewis john mcgibbney <[email protected]> wrote: > Hi dev@, > > Sitting facing some issues with Thrax using Joshua master branch. > I invoke Joshua as follows > > /usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero > --corpus > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en > --tune > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune > --test > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test > --source en --target ru --readme "Experiment 1 Run 1 of ru --> en model > training" --aligner berkeley --tmp /usr/local/hadoop-2.5.2/hadoop_tmp_dir > --first-step thrax --no-prepare --alignment alignments/training.align > --hadoop-mem 10g > > I make the first step thrax as I have previously computed my alignment as > indicated by the arguments. > My Thrax log is available at https://www.dropbox.com/s/ > pxld70ki656fn13/thrax.log?dl=0. In the log you will see an exception as > follows > > 16/10/19 22:56:59 WARN mapred.LocalJobRunner: job_local1314413872_0002 > java.lang.Exception: java.lang.RuntimeException: Word id 2146928632 out > of range 0 1727042 > at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks( > LocalJobRunner.java:462) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > LocalJobRunner.java:522) > Caused by: java.lang.RuntimeException: Word id 2146928632 out of range 0 > 1727042 > at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculat > or$Partition.getPartition(WordLexicalProbabilityCalculator.java:133) > at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculat > or$Partition.getPartition(WordLexicalProbabilityCalculator.java:121) > at org.apache.hadoop.mapred.MapTask$NewOutputCollector. > write(MapTask.java:692) > at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write( > TaskInputOutputContextImpl.java:89) > at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context. > write(WrappedMapper.java:112) > at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculat > or$Map.map(WordLexicalProbabilityCalculator.java:82) > at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculat > or$Map.map(WordLexicalProbabilityCalculator.java:28) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run( > LocalJobRunner.java:243) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > I see no other issues until the end of the Thrax log where I see > > class edu.jhu.thrax.hadoop.jobs.TargetWordGivenSourceWordProbabilityJob > FAILED > class edu.jhu.thrax.hadoop.jobs.OutputJob PREREQ_FAILED > class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob > PREREQ_FAILED > class edu.jhu.thrax.hadoop.features.mapred.TargetPhraseGivenSourceFeature > SUCCESS > class edu.jhu.thrax.hadoop.jobs.ExtractionJob SUCCESS > class edu.jhu.thrax.hadoop.features.mapred.SourcePhraseGivenTargetFeature > SUCCESS > class edu.jhu.thrax.hadoop.jobs.VocabularyJob SUCCESS > class edu.jhu.thrax.hadoop.jobs.SourceWordGivenTargetWordProbabilityJob > FAILED > > This issue has previously been reported by Matt over on > https://github.com/joshua-decoder/thrax/issues/10 > > Debugging right now folks. > Lewis > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney > -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney
