Hi James, There has been a vast literature on adaptation techniques for SMT in recent years.
Some reading suggestions: http://www.statmt.org/wmt07/pdf/WMT17.pdf http://www.statmt.org/wmt09/pdf/WMT-0932.pdf http://dl.acm.org/citation.cfm?id=1870702 http://www.mt-archive.info/IWSLT-2011-Bisazza.pdf http://www.aclweb.org/anthology/P12-1099 http://amta2012.amtaweb.org/AMTA2012Files/papers/115.pdf http://amta2012.amtaweb.org/AMTA2012Files/papers/152.pdf http://www.hltpr.rwth-aachen.de/publications/download/832/Mansour-IWSLT-2012.pdf http://www.aclweb.org/anthology/P/P13/P13-1141.pdf See also http://www.statmt.org/survey/Topic/DomainAdaptation . You'll be able to find many more interesting papers about that topic. Not sure to what extent your idea differs from what has been suggested in previous work. Regarding your question about distortion, you may also want to consult the literature first. Philipp Koehn wrote a nice textbook about all the basics of SMT: http://www.statmt.org/book/ Cheers, Matthias On Wed, 2013-11-06 at 09:35 -0800, Kenneth Heafield wrote: > Hi, > > Multiple column won't really work because the set of phrase pairs will > be different. You could of course take the union of phrase pairs and > just have null values for inapplicable phrases, but it's not clear how > much compression you'd get. > > Kenneth > > On 11/06/13 06:21, Read, James C wrote: > > So here's a random crazy idea I had lately. A phrase table could have > > multiple columns giving different scores for different probabilities from > > different alignments, different corpora, different domains etc. Recent work > > at Edinburgh, Cambridge and Sheffield has had some emphasis on adaptation > > of models for speech recognition purposes. I guess a similar principle > > could be applied to SMT. Given a text from some unknown domain the engine > > could perform some automated recognition test to guess which translation > > model best fits the text to be translated. A primitive form of automatic > > domain recognition and adaptation if you like. > > > > I guess even making available multiple forms of a phrase table or a single > > compact version with multiple columns for scoring could even have some > > demand in the future. > > > > James -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
