I posted this on GoogleCode Issues for BerkeleyAligner, but it's
pretty inactive. So, I thought I'd try here, too. 

I'm using the
unsupervised version 2.1 available from the repository to create word
alignments. This is a small 40,000 phrase pair corpus for testing and
development. The machine is a 6-core AMD Opteron and 16 GB RAM and 1TB
available hard drive space. Java/OS version as follows:
 user@moses0:~$
java -version
 java version "1.6.0_20"
 OpenJDK Runtime Environment
(IcedTea6 1.9.10) (6b20-1.9.10-0ubuntu1~10.04.3)
 OpenJDK 64-Bit Server
VM (build 19.0-b09, mixed mode)

I run the same command on the same
corpus multiple times. Most times, this command completes training
successfully. Sometimes it fails with an AssertionError in a different
location, normally in the first or second iteration of model 1. I list
the command line followed by the errors. I have also tried reducing
numThreads to 5, but it still throws an error.

I suspect environment
problems more than corpus problems.Any suggestions?

Thanks,
Tom

/usr/bin/java -server 
 -Xms1024m 
 -Xmx2048m 
 -Xss768k 
 -ea 

-jar /usr/local/bin/berkeleyaligner.jar 
 -EMWordAligner.numThreads 6 

-Data.trainSources /opt/library/BUILDS/tm/demo_tm/bitext.list 

-Data.foreignSuffix nl 
 -Data.englishSuffix en 
 -Data.testSources 

-exec.execDir
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes 

-exec.create True 
 -Evaluator.writeGIZA True 
 -Main.SaveParams True 

-Main.alignTraining True 
 -Main.forwardModels MODEL1 HMM 

-Main.reverseModels MODEL1 HMM 
 -Main.iters 5 5 
 -Main.mode JOINT
JOINT

The Error:

main() {
 Execution directory:
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes

Preparing Training Data [2.3s, cum. 2.4s]
 41410 training, 0 test

Training models: 2 stages {
 Training stage 1: MODEL1 and MODEL1 jointly
for 5 iterations {
 Initializing forward model [9.1s, cum. 9.1s]

Initializing reverse model [7.9s, cum. 17s]
 Joint Train: 41410
sentences, jointly {
 Iteration 1/5 {
 Sentence 2/41410
 Sentence
1/41410
 Sentence 5/41410
 Sentence 13/41410
 WARNING: Translation model
update concurrency error
 Sentence 54/41410
 WARNING: Translation model
update concurrency error
 Sentence 207/41410
 WARNING: Translation model
update concurrency error
 WARNING: Translation model update concurrency
error
 ERROR:
java.lang.AssertionError:
fig.basic.StringDoubleMap.find(StringDoubleMap.java:397)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner.run(EMWordAligner.java:231)
edu.berkeley.nlp.concurrent.WorkQueue.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)
1
errors, 4 warnings
 ... 585 lines omitted ...
 }

Here's another error
for the same corpus:

main() {
 Execution directory:
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes

Preparing Training Data [2.3s, cum. 2.3s]
 41410 training, 0 test

Training models: 2 stages {
 Training stage 1: MODEL1 and MODEL1 jointly
for 5 iterations {
 Initializing forward model [9.2s, cum. 9.2s]

Initializing reverse model [8.0s, cum. 17s]
 Joint Train: 41410
sentences, jointly {
 Iteration 1/5 {
 Sentence 2/41410
 Sentence
1/41410
 Sentence 4/41410
 Sentence 15/41410
 Sentence 67/41410

WARNING: Translation model update concurrency error
 Sentence 279/41410

Sentence 911/41410
 WARNING: Translation model update concurrency error

Sentence 2218/41410
 Sentence 3908/41410
 Sentence 5776/41410
 Sentence
7744/41410
 Sentence 9737/41410
 Sentence 11746/41410
 Sentence
13767/41410
 Sentence 15780/41410
 Sentence 17802/41410
 Sentence
19841/41410
 Sentence 21912/41410
 Sentence 24000/41410
 Sentence
26120/41410
 Sentence 28239/41410
 Sentence 30359/41410
 Sentence
32490/41410
 Sentence 34634/41410
 Sentence 36776/41410
 Sentence
38928/41410
 ... 40883 lines omitted ...
 } [19s, cum. 19s]
 Iteration
2/5 {
 Sentence 1/41410
 Sentence 5/41410
 Sentence 4/41410
 ERROR:
java.lang.AssertionError:
fig.basic.StringDoubleMap.put(StringDoubleMap.java:72)
fig.basic.StringDoubleMap.switchMapType(StringDoubleMap.java:309)
fig.basic.StringDoubleMap.find(StringDoubleMap.java:386)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner$1.run(EMWordAligner.java:232)
edu.berkeley.nlp.concurrent.WorkQueue$1.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)

... 25 lines omitted ...
 }
 Sentence 28/41410
 Sentence 29/41410

Sentence 30/41410
 Sentence 31/41410
 Sentence 32/41410
 Sentence
33/41410

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to