Hi Ulrich,

Thanks for the detailed answer. I'll try that and follow up on the progress of this work.

Shachar

On 03/18/2014 07:50 PM, Ulrich Germann wrote:
PhraseDictionaryDynSuffixArray is deprecated and should not be used any more. It will be replaced with memory-mapped suffix array phrase tables (mmsapt) which are currently in the branch dynamic-phrase-tables.

In order to use them, you need:

- the two text files, one sentence per line
- the word alignments in symal format

let fr be the language tag for the language you are translating from and en the tag for the language we are translating to

cat train.fr <http://train.fr> | mtt-build -i -o train.fr <http://train.fr>
cat train.en | mtt-build -i -o train.en
cat train.symal | symal2mam train.fr-en.mam
mmlex-build train fr en -o train.fr-en.lex -c train.fr-en.coc

then in moses.ini, the line for the phrase table should look like this:

Mmsapt name=PT0 output-factor=0 num-features=5 base=/path/to/train L1=fr L2=en

No guarantee that this works; this is work in progress. Probably won't work on Mac, and works in multi-threaded mode only.

- Uli



On Mon, Mar 17, 2014 at 4:17 PM, Mirkin, Shachar <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    I'm now subscribed also from this email address.

    Let me give more details about the problems that I encountered.
    Trying to load the Moses server with the modified ini file, after
    replacing the PhraseDictionaryBinary line with:

    PhraseDictionaryDynSuffixArray source=<path-to-source-corpus> 
target=<path-to-target-corpus> alignment=<path-to-alignments>

    (with the correct paths, of course), I got:

    Feature function PhraseDictionaryDynSuffixArray0 specified 1 dense
    scores or weights. Actually has 0

    This was solved by adding "num-features=0" to the
    PhraseDictionaryDynSuffixArray line.

    The next error was:

    ...
    Loading source corpus...
    terminate called after throwing an instance of
    'Moses::StrayFactorException'
      what():  moses/Word.cpp:112 in void
    Moses::Word::CreateFromString(Moses::FactorDirection, const
    std::vector<long unsigned int, std::allocator<long unsigned int>
    >&, const StringPiece&, bool) threw StrayFactorException because
    `fit'.
    You have configured 0 factors but the word le contains factor
    delimiter | too many times.

    In this test my source, target and alignment files consist each of
    a single line with no "|"s, and the word "le" is the first one in
    the source.

    Is there anything else I should do in the ini file?

    Thanks,
    Shachar




    On 03/17/2014 02:58 PM, Hieu Hoang wrote:
    Hi Shachar

    can you please subscribe to the mailing list before posting to
    it. It's a public email address so there's a lot of automated
    spammers. You can subscribe here
    http://mailman.mit.edu/mailman/listinfo/moses-support

    To answer you question - the webpage does document it in the new
    ini format, eg.
    PhraseDictionaryDynSuffixArray source=<path-to-source-corpus> ...
    Do you have a printout of the old version?

    Also, the dynamic suffix array is undergoing updates as Uli
    Germann (cc'ed) is updating it with more features. He can tell
    you more about it


    ---------- Forwarded message ----------
    From: <[email protected]
    <mailto:[email protected]>>
    Date: 17 March 2014 12:13
    Subject: Moses-support post from [email protected]
    <mailto:[email protected]> requires approval
    To: [email protected] <mailto:[email protected]>


    As list administrator, your authorization is requested for the
    following mailing list posting:

        List: [email protected] <mailto:[email protected]>
        From: [email protected]
    <mailto:[email protected]>
        Subject: Incremental training and the new ini format
        Reason:  Post by non-member to a members-only list

    At your convenience, visit:

    http://mailman.mit.edu/mailman/admindb/moses-support

    to approve or deny the request.


    ---------- Forwarded message ----------
    From: "Mirkin, Shachar" <[email protected]
    <mailto:[email protected]>>
    To: [email protected] <mailto:[email protected]>
    Cc:
    Date: Mon, 17 Mar 2014 13:06:47 +0100
    Subject: Incremental training and the new ini format
    Hi,

    I'm trying to use incremental training with the latest Moses
    version, but the documentation refers to the old ini format
    (http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc34).
    Can you please explain what changes are required to get the
    incremental training working with the new ini format?

    Thanks,
    Shachar




    ---------- Forwarded message ----------
    From: [email protected]
    <mailto:[email protected]>
    To:
    Cc:
    Date:
    Subject: confirm 2701c5fb8f659b6037c9e0bf07ad70095ba4ffe2
    If you reply to this message, keeping the Subject: header intact,
    Mailman will discard the held message.  Do this if the message is
    spam.  If you reply to this message and include an Approved: header
    with the list password in it, the message will be approved for
    posting
    to the list.  The Approved: header can also appear in the first line
    of the body of the reply.



-- Hieu Hoang
    Research Associate
    University of Edinburgh
    http://www.hoang.co.uk/hieu





--
Ulrich Germann
Research Associate
School of Informatics
University of Edinburgh

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to