hi tom

i also encountered problems using berkeleyAligner and reported it on their website. As you can see, it seems to have been abandoned for a few years now. I personally wouldn't use it again until someone starts picking up their phone.

What's the motivation for using Berkeley Alignent, as opposed to say, GIZA++/MGIZA?

On 26/02/2012 12:07, Tom Hoar wrote:

Has anyone else had java.lang.AssertionError's or any other kinds of stability problems with BerkeleyAligner?

Thanks,
Tom

-------- Original Message --------

Subject:        BerkeleyAligner AssertionError
Date:   Sun, 19 Feb 2012 13:01:06 +0700
From:   Tom Hoar <[email protected]>
To:     Moses support <[email protected]>

I posted this on GoogleCode Issues for BerkeleyAligner, but it's pretty inactive. So, I thought I'd try here, too.

I'm using the unsupervised version 2.1 available from the repository to create 
word alignments. This is a small 40,000 phrase pair corpus for testing and 
development. The machine is a 6-core AMD Opteron and 16 GB RAM and 1TB 
available hard drive space. Java/OS version as follows:
   user@moses0:~$ java -version
   java version "1.6.0_20"
   OpenJDK Runtime Environment (IcedTea6 1.9.10) (6b20-1.9.10-0ubuntu1~10.04.3)
   OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

I run the same command on the same corpus multiple times. Most times, this 
command completes training successfully. Sometimes it fails with an 
AssertionError in a different location, normally in the first or second 
iteration of model 1. I list the command line followed by the errors. I have 
also tried reducing numThreads to 5, but it still throws an error.

I suspect environment problems more than corpus problems.Any suggestions?

Thanks, Tom

/usr/bin/java -server \
   -Xms1024m \
   -Xmx2048m \
   -Xss768k \
   -ea \
   -jar /usr/local/bin/berkeleyaligner.jar \
   -EMWordAligner.numThreads 6 \
   -Data.trainSources /opt/library/BUILDS/tm/demo_tm/bitext.list \
   -Data.foreignSuffix nl \
   -Data.englishSuffix en \
   -Data.testSources \
   -exec.execDir 
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes \
   -exec.create True \
   -Evaluator.writeGIZA True \
   -Main.SaveParams True \
   -Main.alignTraining True \
   -Main.forwardModels MODEL1 HMM \
   -Main.reverseModels MODEL1 HMM \
   -Main.iters 5 5 \
   -Main.mode JOINT JOINT

The Error:

main() {
   Execution directory: 
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes
   Preparing Training Data [2.3s, cum. 2.4s]
   41410 training, 0 test
   Training models: 2 stages {
     Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations {
       Initializing forward model [9.1s, cum. 9.1s]
       Initializing reverse model [7.9s, cum. 17s]
       Joint Train: 41410 sentences, jointly {
         Iteration 1/5 {
           Sentence 2/41410
           Sentence 1/41410
           Sentence 5/41410
           Sentence 13/41410
           WARNING: Translation model update concurrency error
           Sentence 54/41410
           WARNING: Translation model update concurrency error
           Sentence 207/41410
           WARNING: Translation model update concurrency error
           WARNING: Translation model update concurrency error
           ERROR: java.lang.AssertionError:
fig.basic.StringDoubleMap.find(StringDoubleMap.java:397)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner$1.run(EMWordAligner.java:231)
edu.berkeley.nlp.concurrent.WorkQueue$1.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)
1 errors, 4 warnings
           ... 585 lines omitted ...
         }

Here's another error for the same corpus:


main() {
   Execution directory: 
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes
   Preparing Training Data [2.3s, cum. 2.3s]
   41410 training, 0 test
   Training models: 2 stages {
     Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations {
       Initializing forward model [9.2s, cum. 9.2s]
       Initializing reverse model [8.0s, cum. 17s]
       Joint Train: 41410 sentences, jointly {
         Iteration 1/5 {
           Sentence 2/41410
           Sentence 1/41410
           Sentence 4/41410
           Sentence 15/41410
           Sentence 67/41410
           WARNING: Translation model update concurrency error
           Sentence 279/41410
           Sentence 911/41410
           WARNING: Translation model update concurrency error
           Sentence 2218/41410
           Sentence 3908/41410
           Sentence 5776/41410
           Sentence 7744/41410
           Sentence 9737/41410
           Sentence 11746/41410
           Sentence 13767/41410
           Sentence 15780/41410
           Sentence 17802/41410
           Sentence 19841/41410
           Sentence 21912/41410
           Sentence 24000/41410
           Sentence 26120/41410
           Sentence 28239/41410
           Sentence 30359/41410
           Sentence 32490/41410
           Sentence 34634/41410
           Sentence 36776/41410
           Sentence 38928/41410
           ... 40883 lines omitted ...
         } [19s, cum. 19s]
         Iteration 2/5 {
           Sentence 1/41410
           Sentence 5/41410
           Sentence 4/41410
           ERROR: java.lang.AssertionError:
fig.basic.StringDoubleMap.put(StringDoubleMap.java:72)
fig.basic.StringDoubleMap.switchMapType(StringDoubleMap.java:309)
fig.basic.StringDoubleMap.find(StringDoubleMap.java:386)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner$1.run(EMWordAligner.java:232)
edu.berkeley.nlp.concurrent.WorkQueue$1.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)
           ... 25 lines omitted ...
         }
         Sentence 28/41410
         Sentence 29/41410
         Sentence 30/41410
         Sentence 31/41410
         Sentence 32/41410
         Sentence 33/41410



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to