Hi Ondrej, I do not aim at that much at quality improvement, just want to have a very flexible data structure. I am planning a series of experiments where cruel and twisted things will be done to the source language and thought that might be a good way to speed up experimenting without having to retrain every time. Your idea should not be hard to implement, I will take it into account. Best, Marcin
W dniu 24.07.2013 15:59, Ondrej Bojar pisze: > Hi, Marcin, > > this could be quite useful (although we did not see any improvements in our > experiments, see our WMT(12,11?) paper Selecting Data in EN->CS Translation). > > It can be useful to index a different factor than the factors that one > eventually wants to use in translation. So the config should allow to say: of > all the factors, I want to index these (e.g. 1, the lemma) in the source side > of the parallel corpus, and I want to see these as the source side of the > phrases (e.g. 0&2, the form and the tag). It is also a question whether the > input sentence should be expected to have the same structure of factors as > the training corpus. > > Looking forward to whatever you decide to implement. > > Cheers, O. > > "Hieu Hoang" <[email protected]> wrote: > >> yes, copy PhraseDictionaryDynSuffixArray, or indeed, your own >> PhraseDictionaryCompact. >> >> there's some docs on adding feature functions. >> http://www.statmt.org/moses/?n=Moses.FeatureFunctions >> >> it should be easier than a year ago when you added PhraseDictionaryCompact >> >> On 24 July 2013 13:29, Marcin Junczys-Dowmunt <[email protected]> wrote: >> >>> Hi list, >>> I am planning to integrate a dynamic phrase table based on a lucene >>> index and I am wondering how to approach it. Basically it should have >>> the same functionality as the Dynamic Suffix Array already present in >>> Moses. Now I am wondering if for a first working version it would be >>> enough just to hijack DynSuffixArray.h and implement all the public >>> functions with CLucene. Any hidden pitfalls there? The index would be >>> indexed by the source sentence and contain alignment data and target >>> sentences as fields. >>> Best, >>> Marcin >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> -- >> Hieu Hoang >> Research Associate >> University of Edinburgh >> http://www.hoang.co.uk/hieu >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
