[
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435133#comment-15435133
]
Lewis John McGibbney commented on JOSHUA-304:
---------------------------------------------
It may help for me to post the options available within the current berkeley
aligner jar which was built when I installed Joshua
{code}
lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ java -jar
./lib/berkeleyaligner.jar -help
Usage:
log.maxIndLevel < int> : Maximum indent level. [10]
log.msPerLine < int> : Maximum number of milliseconds
between consecutive lines of output. [1000]
log.file < str> : File to write log. []
log.stdout < bool> : Whether to output to the console.
[true]
log.note < str> : Dummy placeholder for a comment []
log.forcePrint < bool> : Force printing from logs* [false]
log.maxPrintErrors < int> : Maximum number of errors (via
error()) to print [10000]
EMWordAligner.nullProb < dbl> : How to assign null-word
probabilities (=1 means 1/n) [1.0E-6]
EMWordAligner.usePosteriorDecoding < bool> : Use posterior decoding
(recommended for best performance). [true]
EMWordAligner.posteriorDecodingThreshold < dbl> : Threshold in [0,1] for
deciding whether an alignment should exist. [0.5]
EMWordAligner.mergeConsiderNull < bool> : When merging expected sufficient
statistics, take into account the NULL (fix). [false]
EMWordAligner.handleUnknownWords < bool> : Don't crash with unknown words
(better to train on test set). [false]
EMWordAligner.priorFraction < dbl> : Fraction of a count to add for links
in dictionary prior (1 works well). [0.0]
EMWordAligner.numThreads < int> : Number of concurrent threads to use
during E-step (set to number of processors). [1]
EMWordAligner.safeConcurrency < bool> : Safe concurrency (gets rid of
concurrency warnings at the expense of speed) [false]
EMWordAligner.evaluateDuringTraining < bool> : Whether to evaluate the model
after each training iteration (slower, more memory). [false]
TreeWalkModel.usePushProbabilities < bool> : Separate parameters for moving
and pushing. [true]
TreeWalkModel.conditionOnTag < bool> : Whether to condition distortion on
the tag types. [true]
TreeWalkModel.cacheTreePaths < bool> : Whether to cache paths through trees
(uses lots of memory; faster). [false]
Evaluator.searchForThreshold < bool> : Evaluate using line search [false]
Evaluator.thresholdIntervals < int> : Sets the number of intervals for
posterior threshold line search [20]
Evaluator.saveAlignmentObjects < bool> : Save object files for proposed
alignments (large files) [false]
Main.trainSources < str*> : Directories or files containing
training files. [example/train]
Main.testSources < str*> : Directory or file containing testing
files. [example/test]
Main.sentences < int> : Maximum number of the training
sentences to use [2147483647]
Main.offsetTrainingSentences < int> : Skip this number of the first
training sentences [0]
Main.maxTestSentences < int> : Maximum number of the test sentences
to use [2147483647]
Main.offsetTestSentences < int> : Skip this number of the first test
sentences [0]
Main.foreignSuffix < str> : Foreign language file suffix [f]
Main.englishSuffix < str> : English language file suffix [e]
Main.itgTrainTestSplitPoint < int> : When writing test (ITG) posteriors,
where to divide train/test data? [0]
Main.itgInputDir < str> : What directory should we dump ITG
test data to? []
Main.reverseAlignments < bool> : Reverse test set alignments (i.e.,
foreign to english) [false]
Main.oneIndexed < bool> : Are alignments one-indexed (default
== no, 0-indexed) [false]
Main.lowercaseWords < bool> : Convert all words to lowercase
[false]
Main.leaveTrainingOnDisk < bool> : Don't load and store the training
set upfront (slower, but less memory) [false]
Main.saveRejects < bool> : Save rejected sentence pairs [false]
Main.forwardModels <enum*> : Which word alignment model to use in
the forward direction. [MODEL1 HMM]
Main.reverseModels <enum*> : Which word alignment model to use in
the backward direction. [MODEL1 HMM]
Main.iters < int*> : Number of iterations to run the
model. [5 5]
Main.mode <enum*> : Whether to train the two models
jointly or independently. [JOINT JOINT]
Main.trainingCacheMaxSize < int> : Max sentence length for caching the
HMM trellis (efficiency only). [100]
Main.loadParamsDir < str> : Directory to load parameters from. []
Main.loadLexicalModelOnly < bool> : When true, the lexical model is
loaded, but the distortion model is not. [true]
Main.saveParams < bool> : Whether to save parameters. [true]
Main.saveAlignOutput < bool> : Whether to save test alignments
produced by the system. [true]
Main.alignTraining < bool> : Produce two GIZA files and a Pharaoh
file for translation [false]
Main.writePosteriors < bool> : Produce posterior alignment weight
file when aligning training (lots of disk space) [false]
Main.writePosteriorsThreshold < dbl> : In outputting posteriors, where do
we threshold them (0.0 == all posteriors) [0.0]
Main.saveLexicalWeights < bool> : Produce two lexical translation
tables for lexical weighting (unsupported) [false]
Main.competitiveThresholding < bool> : Use competitive thresholding to
eliminate distributed many-to-one alignments [false]
Main.evaluateDirectionalModels < bool> : Evaluate directional models alone
[false]
Main.evaluateHardCombination < bool> : Evaluate hard alignment combinations
[false]
Main.evaluateSoftCombination < bool> : Evaluate soft alignment combinations
[false]
Main.dictionary < str> : Bilingual dictionary file (e.g.,
en-ch.dict) [example/en-ch.dict]
Main.splitDefinitions < bool> : Breaks up multi-word definitions and
enters each word into the dictionary map [false]
Main.rantOutput < bool> : Output a lot of junk (largely
unsupported) [false]
exec.create < bool> : Whether to create a directory for
this run; if not, don't generate output files [false]
exec.monitor < bool> : Whether to create a thread to
monitor the status. [false]
exec.execDir < str> : Directory to put all output files;
if blank, use execPoolDir. []
exec.execPoolDir < str> : Directory which contains all the
executions (or symlinks). []
exec.actualExecPoolDir < str> : Directory which actually holds the
executions. []
exec.overwriteExecDir < bool> : Overwrite the contents of the
execDir if it doesn't exist (e.g., when running a thunk). [false]
exec.useStandardExecPoolDirStrategy < bool> : Assume in the run directory,
automatically set execPoolDir and actualExecPoolDir [false]
exec.printOptionsAndExit < bool> : Simply print options and exit.
[false]
exec.miscOptions < str*> : Miscellaneous options (written to
options.map and output.map, displayed in servlet); example: a=3 b=4 []
exec.addToView < str*> : Name of the view to add this
execution to in the servlet []
exec.recordPath < str> : Record file to write to []
exec.charEncoding < str> : Character encoding []
exec.jarFiles < str*> : Name of jar files to load prior to
execution []
exec.dontInitializeJars < bool> : Skip initialization of jars [false]
exec.initializeJarsAfterDirCreation < bool> : Initialize from jars after
copying them to a newly created execDir [false]
exec.makeThunk < bool> : Make a thunk (a delayed
computation). [false]
exec.thunkAutoQueue < bool> : A note to the servlet to
automatically run the thunk when it sees it [false]
exec.thunkPriority < int> : Priority of the thunk. [0]
exec.thunkMainClassName < str> : Launch this class []
exec.thunkJavaOpts < str> : Java options to pass to Java when
later running the thunk []
exec.thunkUseScala < bool> : Use Scala to run rather than Java
[false]
exec.thunkReqMemory < int> : Use Scala to run rather than Java
(in MB) [1024]
exec.dontCatchExceptions < bool> : Whether to catch exceptions (ignored
when making a thunk) [false]
{code}
> word-align.conf alignment template file not compatible with berkeley aligner
> ----------------------------------------------------------------------------
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
> Issue Type: Bug
> Components: alignment, berkeley, templates
> Affects Versions: 6.0.5
> Reporter: Lewis John McGibbney
> Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner.
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string:
> "5 5"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:580)
> at java.lang.Integer.parseInt(Integer.java:615)
> at
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
> at
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
> at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
> at
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
> at
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
> at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293)
> at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149)
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Cannot create directory: alignments/0
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)