Hi Barry, I found a typo/bug that explains why it hasn't worked so far here: the help message of tokenizer.perl said that the parameter is "-protect", but in fact it is "-protected".
best Lefteris On 14/10/13 21:38, Barry Haddow wrote: > Hi Lefty > > For the 'protect' option, the format is one regular expression per > line. For example if you use a file with one line like this: > > http://\S+ > > then it should protect some URLs from tokenisation. It works for me. > If you have problems then send me the file. > > For the -a option, I think the detokeniser should put the hyphens back > together again, but I have not checked. > > cheers - Barry > > On 14/10/13 19:22, Eleftherios Avramidis wrote: >> Hi, >> >> I see tokenizer.perl now offers an option for excluding URLs and other >> expressions. " -protect FILE ... specify file with patters to be >> protected in tokenisation." Unfortunately there is no explanation of how >> this optional file should be. I tried several ways of writing regular >> expressions for URLs, but URLs still come out tokenized. Could you >> provide an example? >> >> My second question concerns the -a option, for aggressive hyphen >> splitting. Does the detokenizer offer a similar option, to reconstructed >> separeted hyphens? >> >> cheers >> Lefteris >> > -- MSc. Inf. Eleftherios Avramidis DFKI GmbH, Alt-Moabit 91c, 10559 Berlin Tel. +49-30 238 95-1806 Fax. +49-30 238 95-1810 ------------------------------------------------------------------------------------------- Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ------------------------------------------------------------------------------------------- _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
