So how does that work? it just takes all the words from the corpus and guesses "infix themes" ? Or do I have to supply pre-tagged data?
On Mon, Feb 1, 2016 at 9:04 AM, Rico Sennrich <[email protected]> wrote: > Hi Mike, > > here's a link to the tool Marcin mentioned: > https://github.com/rsennrich/subword-nmt > > I haven't tried it on phrase-based MT myself, but feel free to give it a > try. > > You could also try other unsupervised morpheme segmenters like morfessor: > https://github.com/aalto-speech/morfessor > > I don't know if there's any segmentation methods specific for Cherokee. > > best wishes, > Rico > > > On 01.02.2016 13:31, Marcin Junczys-Dowmunt wrote: > > Hi Mike, > > Maybe take a look at Rico's tool for handling unknown words in neural > machine translation. I have been playing around with that for > Russian-English and standard phrase-based SMT with some success. I am just > not sure if your small corpora will be enough to learn useful segmentations > though. > > It's an unsupervised method for word segmentation. For Russian-English I > created a code dictionary of the 100,000 most-frequent segments per > language. Unseen tokens will get segmented. The segmentation is not > neccessarily similar to a linguisticly correct segmentation, though. You > will probably want to try smaller numbers. > > Best, > > Marcin > > W dniu 2016-02-01 14:12, Michael Joyner napisaĆ(a): > > I am trying to use Moses with Cherokee using the New Testament and > Genesis as primary corpus. I am feeding it the WEB, BBE as source English > texts at the moment. > > As Cherokee uses bound pronouns and no articles and has almost nil > preposition analogues, (these features are mostly verb infixes), is there a > technique for corpus adjustment that can be done to improve the phrase > mapping between Cherokee and English? > > I am currently doing Cherokee => English. > > Thanks, Mike > -- > > WEB: World English Bible (Public Domain) > BBE: Basic English Bible (Public Domain) > > - Learn to the Cherokee language: <http://jalagigawoni.gnomio.com/> > http://jalagigawoni.gnomio.com/ > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- - Learn to the Cherokee language: http://jalagigawoni.gnomio.com/
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
