Hi Lefty For the 'protect' option, the format is one regular expression per line. For example if you use a file with one line like this:
http://\S+ then it should protect some URLs from tokenisation. It works for me. If you have problems then send me the file. For the -a option, I think the detokeniser should put the hyphens back together again, but I have not checked. cheers - Barry On 14/10/13 19:22, Eleftherios Avramidis wrote: > Hi, > > I see tokenizer.perl now offers an option for excluding URLs and other > expressions. " -protect FILE ... specify file with patters to be > protected in tokenisation." Unfortunately there is no explanation of how > this optional file should be. I tried several ways of writing regular > expressions for URLs, but URLs still come out tokenized. Could you > provide an example? > > My second question concerns the -a option, for aggressive hyphen > splitting. Does the detokenizer offer a similar option, to reconstructed > separeted hyphens? > > cheers > Lefteris > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
