Hi,

I see tokenizer.perl now offers an option for excluding URLs and other 
expressions. "  -protect FILE  ... specify file with patters to be 
protected in tokenisation." Unfortunately there is no explanation of how 
this optional file should be. I tried several ways of writing regular 
expressions for URLs, but URLs still come out tokenized. Could you 
provide an example?

My second question concerns the -a option, for aggressive hyphen 
splitting. Does the detokenizer offer a similar option, to reconstructed 
separeted hyphens?

cheers
Lefteris

-- 
MSc. Inf. Eleftherios Avramidis
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin
Tel. +49-30 238 95-1806

Fax. +49-30 238 95-1810

-------------------------------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to