Miles Osborne wrote: > this is actually an argument for *preserving case* at all stages, > rather than throwing information away and only re-introducing it > later (with possible random effects like that one).
Yeah, John Henderson and I have had that argument, several times. My concern has been that you introduce data scarcity issues by splitting the counts for rare words. I'm beginning to suspect, however, that this would be outweighed by the better discrimination you'd get by keeping case distinctions. Of course, with Moses, you could have two factors, with and without case. I wonder if anyone has tried that ... > out of curiosity, what happens when you use -1 instead (no limits > on reordering) Well, it takes =way= longer, of course. As for scoring, it does poorly, losing as much as a point of BLEU, whether scoring caselessly or not. Interesting to try, though. - John Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
