Hi,

the decoder is not aware of the fact, if the language model
was trained with -unk. It is recommended to do so. The decoder
uses a floor of -100 log for low language model probabilities,
which may happen with unseen words if <unk> is not in the model.

Here is the part of LanguageModelSRI.cpp where the language
model is loaded:

bool LanguageModelSRI::Load(const std::string &filePath

                         , FactorType factorType

                         , float weight

                         , size_t nGramOrder)
{
        m_srilmVocab  = new Vocab();
  m_srilmModel  = new Ngram(*m_srilmVocab, nGramOrder);
        m_factorType    = factorType;
        m_weight                        = weight;
        m_nGramOrder    = nGramOrder;
        m_filePath              = filePath;

        m_srilmModel->skipOOVs() = false;

        File file( filePath.c_str(), "r" );
        m_srilmModel->read(file);

        // LM can be ok, just outputs warnings
        CreateFactors();
  m_unknownId = m_srilmVocab->unkIndex();

  return true;
}


-phi

On Sat, May 16, 2009 at 5:27 AM, Hongfei Jiang <[email protected]> wrote:
> Hi, all
>       If I train a language model using SRILM, I can use option '-unk' or
> not.
>       And when the decoder load the langmodel. it must specify the unk
> option.
>       As for Moses, how can it know langmodel will be loaded is trained with
> '-unk' or not?
>       Is there any item in moses.ini to indicate this message or the decoder
> can automatically detect the input langmodel for the <unk>??
>    Best Regards,
> -Fei
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to