Hi,
the decoder is not aware of the fact, if the language model
was trained with -unk. It is recommended to do so. The decoder
uses a floor of -100 log for low language model probabilities,
which may happen with unseen words if <unk> is not in the model.
Here is the part of LanguageModelSRI.cpp where the language
model is loaded:
bool LanguageModelSRI::Load(const std::string &filePath
, FactorType factorType
, float weight
, size_t nGramOrder)
{
m_srilmVocab = new Vocab();
m_srilmModel = new Ngram(*m_srilmVocab, nGramOrder);
m_factorType = factorType;
m_weight = weight;
m_nGramOrder = nGramOrder;
m_filePath = filePath;
m_srilmModel->skipOOVs() = false;
File file( filePath.c_str(), "r" );
m_srilmModel->read(file);
// LM can be ok, just outputs warnings
CreateFactors();
m_unknownId = m_srilmVocab->unkIndex();
return true;
}
-phi
On Sat, May 16, 2009 at 5:27 AM, Hongfei Jiang <[email protected]> wrote:
> Hi, all
> If I train a language model using SRILM, I can use option '-unk' or
> not.
> And when the decoder load the langmodel. it must specify the unk
> option.
> As for Moses, how can it know langmodel will be loaded is trained with
> '-unk' or not?
> Is there any item in moses.ini to indicate this message or the decoder
> can automatically detect the input langmodel for the <unk>??
> Best Regards,
> -Fei
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support