Hi Judah

The tokeniser also escapes some characters which have special meaning for Moses, and at decoding time the most important of these is the pipe (|). A stray pipe probably caused Moses to fail for you, but URLs shouldn't contain pipes.


cheers - Barry

On 15/07/14 13:59, Judah Schvimer wrote:
HI,

Thank you very much! That's incredibly helpful. My one concern is that before I tokenized the input to the decoder it was crashing. Do you know what tokens would cause that behavior if left in? Would you recommend just not tokenizing path names and urls and leaving everything else?

Judah


On Tue, Jul 15, 2014 at 4:02 AM, Barry Haddow <[email protected] <mailto:[email protected]>> wrote:

    Hi Judah

    The actual problem here is that you do not want path names split
    by the tokeniser. It's only really set up to deal with regular
    text, but what you can do is ask it to "protect" certain patterns
    by using the

    -protected <filename>

    argument. The file <filename> should contain a list of regular
    expressions (one per line), and the tokeniser will not split apart
    any tokens which match these REs. I'm guessing that in the example
    below you don't want "tutorial" translated into the target
    language, and if the tokeniser doesn't split the path then the
    whole thing will pass through as an OOV,

    cheers - Barry


    On 14/07/14 16:53, Judah Schvimer wrote:

        Hi,

        When I'm using the decoder I have to tokenize my target
        sentences before I translate them. However, when I detokenize
        them it leaves awkward spaces around what was tokenized. is
        there any way to fix this? It seems to be mainly around
        slashes and colons

        Source: :doc:`/tutorial/aggregation-zip-code-data-set`
        Target: : Doc: '/ tutorial / aggregation-zip-code-data-set'

        Thanks,
        Judah


        _______________________________________________
        Moses-support mailing list
        [email protected] <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support



-- The University of Edinburgh is a charitable body, registered in
    Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to