Hi Lefty Thanks for pointing that out - I fixed it,
cheers - Barry On 16/10/13 14:09, Eleftherios Avramidis wrote: > Hi Barry, > > I found a typo/bug that explains why it hasn't worked so far here: the > help message of tokenizer.perl said that the parameter is "-protect", > but in fact it is "-protected". > > best > Lefteris > > > > On 14/10/13 21:38, Barry Haddow wrote: >> Hi Lefty >> >> For the 'protect' option, the format is one regular expression per >> line. For example if you use a file with one line like this: >> >> http://\S+ >> >> then it should protect some URLs from tokenisation. It works for me. >> If you have problems then send me the file. >> >> For the -a option, I think the detokeniser should put the hyphens >> back together again, but I have not checked. >> >> cheers - Barry >> >> On 14/10/13 19:22, Eleftherios Avramidis wrote: >>> Hi, >>> >>> I see tokenizer.perl now offers an option for excluding URLs and other >>> expressions. " -protect FILE ... specify file with patters to be >>> protected in tokenisation." Unfortunately there is no explanation of >>> how >>> this optional file should be. I tried several ways of writing regular >>> expressions for URLs, but URLs still come out tokenized. Could you >>> provide an example? >>> >>> My second question concerns the -a option, for aggressive hyphen >>> splitting. Does the detokenizer offer a similar option, to >>> reconstructed >>> separeted hyphens? >>> >>> cheers >>> Lefteris >>> >> > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
