hello, my question is regarding moses.ini, if we uses IRSTLM should we replace the KENLM by IRSTLM in moses.ini
thanks On Thu, Nov 26, 2015 at 6:00 PM, <[email protected]> wrote: > Send Moses-support mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mailman.mit.edu/mailman/listinfo/moses-support > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Moses-support digest..." > > > Today's Topics: > > 1. Re: Language model question (Dingyuan Wang) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Nov 2015 00:05:51 +0800 > From: Dingyuan Wang <[email protected]> > Subject: Re: [Moses-support] Language model question > To: Vincent Nguyen <[email protected]> > Cc: moses-support <[email protected]> > Message-ID: > < > caft8h74h6ta+ijkc_chao2dvuchqnonvk64q+jdn99jk5b-...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > I tend to fix it in the tokenization script, or I would solve this in some > pre-processing scripts if there are any obvious patterns in the noise. > > -- > Dingyuan > 2015?11?26? 21:09? "Vincent Nguyen" <[email protected]>??? > > > Hi all, > > > > I have a question regarding LMs. > > > > Let's take the example of news.2014.shuffle.en > > > > When we process it through punctuation normalization for english > > language, it will for instance put a " " before an apostrophe > > "it is'nt" = > "it is 'nt" > > > > BUT it contains some noise, for instance there is some french sentences > > in the corpus, for which the apostrophe process will not be suited > > "j'aime" => "j 'aime" => it will create the token 'aime > > > > My point is the following, > > > > At stage of LM building, how can we prune to eliminate such token like > > "'aime" so that it does not create wrong uni-grams, nor bi-grams, ... > > > > the ngram -minprune only take 2 as a minimum so wrong unigrams will > > still be taken in the LM. > > > > > > Hope I'm clear enough .... > > > > Vincent > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20151126/e6c989a0/attachment-0001.html > > ------------------------------ > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > End of Moses-support Digest, Vol 109, Issue 70 > ********************************************** >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
