Hi Ulrich,
Thanks for the detailed answer. I'll try that and follow up on the
progress of this work.
Shachar
On 03/18/2014 07:50 PM, Ulrich Germann wrote:
PhraseDictionaryDynSuffixArray is deprecated and should not be used
any more. It will be replaced with memory-mapped suffix array phrase
tables (mmsapt) which are currently in the branch dynamic-phrase-tables.
In order to use them, you need:
- the two text files, one sentence per line
- the word alignments in symal format
let fr be the language tag for the language you are translating from
and en the tag for the language we are translating to
cat train.fr <http://train.fr> | mtt-build -i -o train.fr
<http://train.fr>
cat train.en | mtt-build -i -o train.en
cat train.symal | symal2mam train.fr-en.mam
mmlex-build train fr en -o train.fr-en.lex -c train.fr-en.coc
then in moses.ini, the line for the phrase table should look like this:
Mmsapt name=PT0 output-factor=0 num-features=5 base=/path/to/train
L1=fr L2=en
No guarantee that this works; this is work in progress. Probably won't
work on Mac, and works in multi-threaded mode only.
- Uli
On Mon, Mar 17, 2014 at 4:17 PM, Mirkin, Shachar
<[email protected] <mailto:[email protected]>>
wrote:
Hi,
I'm now subscribed also from this email address.
Let me give more details about the problems that I encountered.
Trying to load the Moses server with the modified ini file, after
replacing the PhraseDictionaryBinary line with:
PhraseDictionaryDynSuffixArray source=<path-to-source-corpus>
target=<path-to-target-corpus> alignment=<path-to-alignments>
(with the correct paths, of course), I got:
Feature function PhraseDictionaryDynSuffixArray0 specified 1 dense
scores or weights. Actually has 0
This was solved by adding "num-features=0" to the
PhraseDictionaryDynSuffixArray line.
The next error was:
...
Loading source corpus...
terminate called after throwing an instance of
'Moses::StrayFactorException'
what(): moses/Word.cpp:112 in void
Moses::Word::CreateFromString(Moses::FactorDirection, const
std::vector<long unsigned int, std::allocator<long unsigned int>
>&, const StringPiece&, bool) threw StrayFactorException because
`fit'.
You have configured 0 factors but the word le contains factor
delimiter | too many times.
In this test my source, target and alignment files consist each of
a single line with no "|"s, and the word "le" is the first one in
the source.
Is there anything else I should do in the ini file?
Thanks,
Shachar
On 03/17/2014 02:58 PM, Hieu Hoang wrote:
Hi Shachar
can you please subscribe to the mailing list before posting to
it. It's a public email address so there's a lot of automated
spammers. You can subscribe here
http://mailman.mit.edu/mailman/listinfo/moses-support
To answer you question - the webpage does document it in the new
ini format, eg.
PhraseDictionaryDynSuffixArray source=<path-to-source-corpus> ...
Do you have a printout of the old version?
Also, the dynamic suffix array is undergoing updates as Uli
Germann (cc'ed) is updating it with more features. He can tell
you more about it
---------- Forwarded message ----------
From: <[email protected]
<mailto:[email protected]>>
Date: 17 March 2014 12:13
Subject: Moses-support post from [email protected]
<mailto:[email protected]> requires approval
To: [email protected] <mailto:[email protected]>
As list administrator, your authorization is requested for the
following mailing list posting:
List: [email protected] <mailto:[email protected]>
From: [email protected]
<mailto:[email protected]>
Subject: Incremental training and the new ini format
Reason: Post by non-member to a members-only list
At your convenience, visit:
http://mailman.mit.edu/mailman/admindb/moses-support
to approve or deny the request.
---------- Forwarded message ----------
From: "Mirkin, Shachar" <[email protected]
<mailto:[email protected]>>
To: [email protected] <mailto:[email protected]>
Cc:
Date: Mon, 17 Mar 2014 13:06:47 +0100
Subject: Incremental training and the new ini format
Hi,
I'm trying to use incremental training with the latest Moses
version, but the documentation refers to the old ini format
(http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc34).
Can you please explain what changes are required to get the
incremental training working with the new ini format?
Thanks,
Shachar
---------- Forwarded message ----------
From: [email protected]
<mailto:[email protected]>
To:
Cc:
Date:
Subject: confirm 2701c5fb8f659b6037c9e0bf07ad70095ba4ffe2
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message. Do this if the message is
spam. If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for
posting
to the list. The Approved: header can also appear in the first line
of the body of the reply.
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
--
Ulrich Germann
Research Associate
School of Informatics
University of Edinburgh
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support