Hi!

I am trying to tweak the moses code (2009-04-13 version) to use a supplied
list of custom translation options instead of depending on the phrase
translation tables. i.e

For example
for the input "This is a house".

I would like to supply word level options for each token
This
is                                          ....                  .....
and so on
TRANSLATION OPTION 1          TRANSLATION OPTION 1
TRANSLATION OPTION 2          TRANSLATION OPTION 2
TRANSLATION OPTION 3          TRANSLATION OPTION 3

ie for a token t*i* i would like to supply my own list of t*ij*.

I would like to continue using the language model scores etc.

I went through the paper on the design of the Moses decoder (*Design of the
Moses Decoder for Statistical Machine Translation)* and the code
documentation available online but I need some help.

>From my understanding the translation options for each token are stored
using the
TranslationOptionCollection.cpp data structure.

The Manager.cpp class instantiates the TranslationOptionCollection 2 D array
and initializes the Hypothesis stack and also initializes the phrase
tables,reorder models and language models.
In Manager.cpp
Line 59:
const StaticData &staticData = StaticData::Instance();
    staticData.InitializeBeforeSentenceProcessing(source);

However,
In Manager.cpp
at Line 88:

    const vector <DecodeGraph*>
            &decodeStepVL = staticData.GetDecodeStepVL();
    m_transOptColl->CreateTranslationOptions(decodeStepVL);

*What are the DecodeGraph and the DecodeStep data structures used for?*

DecodeStep.cpp
Line 5

DecodeStep::DecodeStep(Dictionary *ptr, const DecodeStep* prev)
:m_ptr(ptr)
{
    FactorMask prevOutputFactors;
    if (prev) prevOutputFactors = prev->m_outputFactors;
    m_outputFactors = prevOutputFactors;
    FactorMask conflictMask = (m_outputFactors &
ptr->GetOutputFactorMask());
    m_outputFactors |= ptr->GetOutputFactorMask();
    FactorMask newOutputFactorMask = m_outputFactors ^ prevOutputFactors;
//xor
  m_newOutputFactors.resize(newOutputFactorMask.count());
    m_conflictFactors.resize(conflictMask.count());
    size_t j=0, k=0;
  for (size_t i = 0; i < MAX_NUM_FACTORS; i++) {
    if (newOutputFactorMask[i]) m_newOutputFactors[j++] = i;
        if (conflictMask[i]) m_conflictFactors[k++] = i;
    }
  VERBOSE(2,"DecodeStep():\n\toutputFactors=" << m_outputFactors
      << "\n\tconflictFactors=" << conflictMask
      << "\n\tnewOutputFactors=" << newOutputFactorMask << std::endl);
}*
*
*This seems to do some phrase table specific processing but I am not sure.*
*Considering I dont need the phrase table and will provide my own
TranslationListOption using my own method, what changes do I need to make to
DecodeStep, if any?*

After this step I guess I can continue with the
SearchNormal::ProcessSentence() call that is made in Manager.cpp (Line 100)
without changes to get the best hypothesis.

*To summarize,*
*In the decoding step: I would like to change the source of the translations
options used for beam search to my own custom generated lists.
*I will write some code to load the TranslationOptionCollection data
structure to read lists from my custom source instead of the phrase table.
I would like to continue using the LM etc being used earlier for the
translation.

I would greatly appreciate any help/guidance that you may be able to
provide.
Looking forward to hearing from you!

Thank You,
Regards

Danish Contractor
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to