Dear Marcin, I have uploaded my EMS files for WMT'16: https://github.com/bicici/ParFDAWMT16
Text processing steps can be language-dependent, might require domain knowledge and expertise, and distinct you from others elevating your results. I suggest reading relevant sections from the papers of WMT participants to get a feel of the computational requirements, that are not necessarily made obvious, such as the use of unsupervised learning of classes in language models and alignment. Text processing helps the datasets to take the form you like them to have even if you consider as evil. If removing punctuation from some dataset helps, then this may be found ingenuious as well. Barry Haddow has prepared preprocessed WMT'17 datasets: http://data.statmt.org/wmt17/translation-task/preprocessed/ http://www.statmt.org/wmt17/translation-task.html Regards, Ergun On Sun, Nov 26, 2017 at 12:41 PM, Marcin Junczys-Dowmunt <[email protected] > wrote: > Hi list, > > I am preparing a couple of usage example for my NMT toolkit and got hung > up on all the preprocessing and other evil stuff. I am wondering is > there now anything decent around for doing preprocessing, running > experiments and evaluation? Or is the best thing still GNU make (isn't > that embarrassing)? > > Best, > > Marcin > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Regards, Ergun
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
