Dear All, I implemented a data selection tool for domain adaptation based on Invitation Model as described in: Hoang, Cuong and Sima'an, Khalil (2014): Latent Domain Translation Models in Mix-of-Domains Haystack, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, http://www.aclweb.org/anthology/C14-1182.pdf
The developed tool is available at the following Github repository: *https://github.com/amirkamran/InvitationModel* <https://github.com/amirkamran/InvitationModel> Invitation based data selection approach exploits in-domain data (both monolingual and bilingual) as prior to guide word alignment and phrase pair estimates in the large mix-domain corpus. As a by-product, accurate estimates for P(D|e,f) of the mixed-domain sentences are produced (with D being either in-domain or out-of-domain), which can be used to rank the sentences in mix-domain according to their relevance to in-domain corpus. This work has been conducted at ILLC (Institute for Logic, Language and Computation, University of Amsterdam) https://www.illc.uva.nl as part of the project "Data-Powered Domain-Specific Translation Services On Demand", supported by the grant "STW Open Technologieprogramma". Regards Amir Kamran Research Programmer Institute of Logic, Language and Computation (ILLC) University of Amsterdam
_______________________________________________ Mt-list site list [email protected] http://lists.eamt.org/mailman/listinfo/mt-list
