yes I only used the $SCRIPTS_ROOTDIR/training/clean-corpus-n.perl on
both the English and Arabic training set.
I tokenized the english part with moses tokenizer , where i used
the lowercase.perl and tokenizer.perl script.
the Arabic part is tokenized using MADA tool, and all these
characters were normalized <, > , and | into Latin characters
I am expecting that some weird character appearing in the corpus.
When I have such a a case, usually the training script would print the
sentence with such characters and would stop building the phrase table.
What I usually previously did is that I manually removed the sentence
from the training data and alignment file and continue running the
training script, which ends successfully afterwards, with tuning and
decoding.
In this case I can see that the training ended successfully, and all
the following files were generated :
419M model/aligned.0,1.ar
224M model/aligned.0.ar
256M model/aligned.0.en
236M model/aligned.grow-diag-final-and
1.2G model/extract.0-0,1.inv.sorted.gz
1.2G model/extract.0-0,1.sorted.gz
870M model/extract.0-0.o.sorted.gz
92M model/lex.0-0,1.e2f
92M model/lex.0-0,1.f2e
4.0K model/moses.ini
2.6G model/phrase-table.0-0,1.gz
931M model/reordering-table.0-0.wbe-msd-bidirectional-fe.gz
The error is only when the phrase table is loaded from the filtered
directory:
4.0K filtered/info
260K filtered/input.1002
4.0K filtered/moses.ini
235M filtered/phrase-table.0-0,1.1.1.gz
957M filtered/reordering-table.0-0.wbe-msd-bidirectional-fe
I even tried decoding alone with out mert on the test set after
filtering the phrase table using filter-model-given-input.pl
<http://filter-model-given-input.pl> script, and it gave the same error.
If that is the case, is there a way to know on which phrase pair did
loading fail ?
On Sat, Oct 18, 2014 at 10:58 AM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
the moses.ini looks ok. Did you clean your training data? Did you
tokenize it with the moses tokenizer? Did you do anything to your
phrase-table?
On 18 October 2014 17:49, Mohammad Salameh <[email protected]
<mailto:[email protected]>> wrote:
Hi Hieu
Please find the moses.ini file attached
the exact commands are:
####TRAIN TM
$SCRIPTS_ROOTDIR/training/train-model.perl -root-dir $WORK
-external-bin-dir $MGIZA_HOME -corpus $WORK/corpus/trn.fil -f
en -e ar -alignment grow-diag-final-and -max-phrase-length 8
--translation-factors 0-0,1 --alignment-factors 0-1
-reordering msd-bidirectional-fe -mgiza -lm
0:5:$WORK/lm/ar_surf.lm &>$WORK/training.out
####TUNE
mkdir $WORK/tuning/mertA
SCRIPTS_ROOTDIR/training/mert-moses.pl
<http://mert-moses.pl/> $WORK/tuning/dev.en
$WORK/tuning/dev.ar <http://dev.ar/> $MOSES
$WORK/model/moses.ini --working-dir $WORK/tuning/mertA
--mertdir $MOSES_HOME/bin --decoder-flags "-threads 11
-max-phrase-length 8" --threads 11 &> $WORK/tuning/mertA/mert.out
Thanks,
Mohammad
On Sat, Oct 18, 2014 at 6:20 AM, Hieu Hoang
<[email protected] <mailto:[email protected]>> wrote:
hi mohammad
On 17 October 2014 21:45, Mohammad Salameh
<[email protected] <mailto:[email protected]>> wrote:
Thanks Hieu,
I wan to exclude the <s> because I want to translate
chunks of source sentences with one model, and then
add them and their score as extra feature to a phrase
table of a different model.
So I don't want the sentence boundaries to be involved
in the translation.
I understand. Moses doesn't allow you to exclude <s>,
however, if you don't want the score for this, then maybe
you should write a feature function to subtract it from
the score. Or modify an existing language model to not
score <s>
Also, I trained a factored system with
--translation-factors 0-0,1. The training process
ended successfully and I do not see any error with the
training.out file.
But the tuning and decoding is ending up with
Segmentation Fault error when loading the phrase table
and when it reaches 3% when loading.
I have attached the mert.out.
Would it be possible to know the reason, or which
phrases in the phrase table is causing the
interruption in loading?
Can you also send the moses.ini file you used, and the
EXACT command you executed.
Thanks,
Salameh
On Fri, Oct 17, 2014 at 12:57 PM, Hieu Hoang
<[email protected] <mailto:[email protected]>> wrote:
sorry, must have missed your email. Answers below
On 16/10/14 20:21, Mohammad Salameh wrote:
Hi,
any answer to the above questions,
Thanks,
Salameh
On Fri, Oct 10, 2014 at 10:11 AM, Mohammad
Salameh <[email protected]
<mailto:[email protected]>> wrote:
Hi
I have few questions on how Moses system works
1) would it be possible to do a factored
translation where factors appear in the
output but do not be part of the translation
process. For example, I have English surface
form on source side and Arabic surface and
their stems on the target side. I want to
translate from English surface form to Arabic
surface, but also see the stems accompanying
the surface forms in the output.
I have tried setting --translation-factors
0-0 , but only ended up with the Arabic
surface forms in the output.
I'm not sure what you mean by 'not be part of the
translation process'. If you want to see the stem
in the output but you don't want it in the
translation table, then there needs to be some
process that generate the stem, given the target
word. Moses has a crude solution - it is called
the generation step.
2) when translating sentences with moses , I
assume that moses adds the sentence boundary
markers <s> </s> automatically. Would it be
possible to exclude these from the
translation. I need to get translation scores
for chunks of input sentences which does not
involve scores generated based on <s> and
</s> from LM or phrase table.
Yes, it include <s> </s>. No, you can't exclude
these from the translation process.
I'm curious to know why you want to exclude these
3) I added additional phrases to the phrase
table. Should the phrase table be sorted
again and is it enough to do "LC_ALL=C sort "
on the PT to be used properly ?
Yes, it needs to be sorted again. You must also
make sure that the new phrases are not duplicates
of existing phrases
Thanks
_______________________________________________
Moses-support mailing list
[email protected]
<mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu