Moses isn't confused. Square brackets are reserved characters and the user is responsible for escaping them. To review, these are the characters users must escape (for various reasons) both for your runtime with the Moses binary and in your source/target training corpus. Moses' build-in tokenizer.perl has a command-line option to enable/disable escaping these. It also has other utilities to do it separately.

   &
   "
   '
   <
    >
   |
   [
   ]

Did I miss any or has anything changed?

Tom


On 4/28/2016 9:58 PM, [email protected] wrote:
Date: Tue, 26 Apr 2016 10:24:48 +0800
From: Markus Saers<[email protected]>
Subject: [Moses-support] Phrases containing brackets mistaken for
malformed nonterminals
To:"[email protected]"  <[email protected]>

Hello,

I am having problems reading in a phrase table derived from a corpus
that (I have learned now) contained bracketed expressions such as "to
like [someone]". I appears that Moses confuses these string with
nonterminals. I built a regular phrase-based model, so I was a bit
confused when it contained malformed nonterminals. Is there any way to
tell Moses that this is a regular phrase-based model, and that it
(he?) shouldn't look for nonterminals?

/Markus

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to