Thanks Hieu, for your confirmation. I agree with your assessment
about the support. I think I'll ask a Java guru friend if he's up to the
challenge of troubleshooting the error. But, I'm not sure if the correct
source is published on Googlecode.
As for motivation, mainly it's just
a case of having a variety of tools. We're about to release an update of
DoMY with a Python plugin shell for train-model.perl that supports all
71 command line arguments (and another plugin with support for
mert-moses.pl'sl 45 arguments). The training plugin also adds integrated
support for various other substitutes for steps 1,2 & 3. Users can
choose to use BerkeleyAligner to create GIZA++ files and then use
train-model.perl's step 3 to generate all possible alignment types from
BerkeleyAligner's GIZA files. Also, BerkeleyAligner support includes
softunion and low-posterior alignment types. Users can even configure
the plugin to create recaser alignments instead of running
train-recaser.perl. By the way, I found the Moses manual.pdf refers to
the recaser alignment as "word-to-word". So, I used that term instead of
our previous discussion about "1-to-1".
Some strong reason for wanting
to BerkeleyAligner: our experience has shown that its alignments often
give better results with Asian-to-European language pairs. This requires
some experimentation with each corpus to confirm. Also, on my 6-core AMD
Opteron CPU, Java efficiently uses 600% of the CPU time for all of the
processing... that is if it doesn't crash in the first two rounds. Maybe
that's one reason for the AssertionError, but throttling back to only 5
threads doesn't help. Train-model.perl's MGIZA++ option runs
multi-threaded at ~500%-550% but only when engaged with the MGIZA++
binary. The other steps (corpus preparation, mkcls and snt2cooc)
throttle back to single threaded. Our testing shows end-to-end
processing of steps 1,2,3 are a little faster with BerkeleyAligner...
again, if it doesn't crash up front.
Any Java hounds out there who
want to look at the BerkeleyAligner code?
Tom
On Mon, 27 Feb 2012
13:22:44 +0000, Hieu Hoang wrote: hi tom
i also encountered problems
using berkeleyAligner and reported it on their website. As you can see,
it seems to have been abandoned for a few years now. I personally
wouldn't use it again until someone starts picking up their phone.
What's the motivation for using Berkeley Alignent, as opposed to say,
GIZA++/MGIZA?
On 26/02/2012 12:07, Tom Hoar wrote:
Has anyone else
had java.lang.AssertionError's or any other kinds of stability problems
with BerkeleyAligner?
Thanks,
Tom
-------- Original Message
--------
SUBJECT:
BerkeleyAligner AssertionError
DATE:
Sun, 19 Feb 2012 13:01:06 +0700
FROM:
Tom Hoar [1]
TO:
Moses support [2]
I posted this on GoogleCode Issues for
BerkeleyAligner, but it's pretty inactive. So, I thought I'd try here,
too.
I'm using the unsupervised version 2.1 available from the
repository to create word alignments. This is a small 40,000 phrase pair
corpus for testing and development. The machine is a 6-core AMD Opteron
and 16 GB RAM and 1TB available hard drive space. Java/OS version as
follows:
user@moses0:~$ java -version
java version "1.6.0_20"
OpenJDK
Runtime Environment (IcedTea6 1.9.10) (6b20-1.9.10-0ubuntu1~10.04.3)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
I run the same
command on the same corpus multiple times. Most times, this command
completes training successfully. Sometimes it fails with an
AssertionError in a different location, normally in the first or second
iteration of model 1. I list the command line followed by the errors. I
have also tried reducing numThreads to 5, but it still throws an
error.
I suspect environment problems more than corpus problems.Any
suggestions?
Thanks, Tom
/usr/bin/java -server
-Xms1024m
-Xmx2048m
-Xss768k
-ea
-jar /usr/local/bin/berkeleyaligner.jar
-EMWordAligner.numThreads 6
-Data.trainSources
/opt/library/BUILDS/tm/demo_tm/bitext.list
-Data.foreignSuffix nl
-Data.englishSuffix en
-Data.testSources
-exec.execDir
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes
-exec.create True
-Evaluator.writeGIZA True
-Main.SaveParams True
-Main.alignTraining True
-Main.forwardModels MODEL1 HMM
-Main.reverseModels MODEL1 HMM
-Main.iters 5 5
-Main.mode JOINT
JOINT
The Error:
main() {
Execution directory:
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes
Preparing Training Data [2.3s, cum. 2.4s]
41410 training, 0 test
Training models: 2 stages {
Training stage 1: MODEL1 and MODEL1 jointly
for 5 iterations {
Initializing forward model [9.1s, cum. 9.1s]
Initializing reverse model [7.9s, cum. 17s]
Joint Train: 41410
sentences, jointly {
Iteration 1/5 {
Sentence 2/41410
Sentence
1/41410
Sentence 5/41410
Sentence 13/41410
WARNING: Translation model
update concurrency error
Sentence 54/41410
WARNING: Translation model
update concurrency error
Sentence 207/41410
WARNING: Translation model
update concurrency error
WARNING: Translation model update concurrency
error
ERROR:
java.lang.AssertionError:
fig.basic.StringDoubleMap.find(StringDoubleMap.java:397)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner.run(EMWordAligner.java:231)
edu.berkeley.nlp.concurrent.WorkQueue.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)
1
errors, 4 warnings
... 585 lines omitted ...
}
Here's another error
for the same corpus:
main() { Execution directory:
/opt/library/TRAININGS/alignments/align-demo_tm-en-nl/berk.classes
Preparing Training Data [2.3s, cum. 2.3s] 41410 training, 0 test
Training models: 2 stages { Training stage 1: MODEL1 and MODEL1 jointly
for 5 iterations { Initializing forward model [9.2s, cum. 9.2s]
Initializing reverse model [8.0s, cum. 17s] Joint Train: 41410
sentences, jointly { Iteration 1/5 { Sentence 2/41410 Sentence 1/41410
Sentence 4/41410 Sentence 15/41410 Sentence 67/41410 WARNING:
Translation model update concurrency error Sentence 279/41410 Sentence
911/41410 WARNING: Translation model update concurrency error Sentence
2218/41410 Sentence 3908/41410 Sentence 5776/41410 Sentence 7744/41410
Sentence 9737/41410 Sentence 11746/41410 Sentence 13767/41410 Sentence
15780/41410 Sentence 17802/41410 Sentence 19841/41410 Sentence
21912/41410 Sentence 24000/41410 Sentence 26120/41410 Sentence
28239/41410 Sentence 30359/41410 Sentence 32490/41410 Sentence
34634/41410 Sentence 36776/41410 Sentence 38928/41410 ... 40883 lines
omitted ... } [19s, cum. 19s] Iteration 2/5 { Sentence 1/41410 Sentence
5/41410 Sentence 4/41410 ERROR: java.lang.AssertionError:
fig.basic.StringDoubleMap.put(StringDoubleMap.java:72)
fig.basic.StringDoubleMap.switchMapType(StringDoubleMap.java:309)
fig.basic.StringDoubleMap.find(StringDoubleMap.java:386)
fig.basic.StringDoubleMap.incr(StringDoubleMap.java:78)
fig.basic.String2DoubleMap.incr(String2DoubleMap.java:51)
edu.berkeley.nlp.wordAlignment.SentencePairState.updateTransProbs(SentencePairState.java:79)
edu.berkeley.nlp.wordAlignment.distortion.Model1or2SentencePairState.updateNewParams(Model1or2SentencePairState.java:91)
edu.berkeley.nlp.wordAlignment.EMWordAligner$1.run(EMWordAligner.java:232)
edu.berkeley.nlp.concurrent.WorkQueue$1.run(WorkQueue.java:70)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636) ... 25 lines omitted ... }
Sentence 28/41410 Sentence 29/41410 Sentence 30/41410 Sentence 31/41410
Sentence 32/41410 Sentence 33/41410
_______________________________________________
Moses-support mailing
list
[email protected]
[3]
http://mailman.mit.edu/mailman/listinfo/moses-support [4]
Links:
------
[1] mailto:[email protected]
[2]
mailto:[email protected]
[3] mailto:[email protected]
[4]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support