hey danish

there's the XML input option which you can use to supply custom translation
    http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4

if you want to create your own translation options, a good place to look is
   TranslationOptionCollection.cpp::ProcessOneUnknownWord() line 211
to see how trans. opt can be created without a phrase table.

The 1 tricky thing when creating your own trans. opt, is to make sure that the phrases it uses aren't cleaned up while decoding still occurs, and is garbage collected once decoding the sentence completes. This is normally handled by the phrase table class. In unknown word processing, the data used is put into m_unksrcs, which is cleaned up after decoding. You have to do something similar.

the DecodeGraph and DecodeStep are used to create systems with multiple phrase tables and generation tables. You don't need to touch these for now.

hieu

ps. what's your real name?
pps. please subscribe to the mailing list to post to it,
           http://mailman.mit.edu/mailman/listinfo/moses-support
        otherwise your mail will keep bouncing

On 10/01/2010 15:42, Danish Contractor wrote:
Hi!

I am trying to tweak the moses code (2009-04-13 version) to use a supplied list of custom translation options instead of depending on the phrase translation tables. i.e

For example
for the input "This is a house".

I would like to supply word level options for each token
This is .... ..... and so on
TRANSLATION OPTION 1          TRANSLATION OPTION 1
TRANSLATION OPTION 2          TRANSLATION OPTION 2
TRANSLATION OPTION 3          TRANSLATION OPTION 3

ie for a token t*i* i would like to supply my own list of t*ij*.

I would like to continue using the language model scores etc.

I went through the paper on the design of the Moses decoder (/Design of the Moses Decoder for Statistical Machine Translation)/ and the code documentation available online but I need some help.

From my understanding the translation options for each token are stored using the
TranslationOptionCollection.cpp data structure.

The Manager.cpp class instantiates the TranslationOptionCollection 2 D array and initializes the Hypothesis stack and also initializes the phrase tables,reorder models and language models.
In Manager.cpp
Line 59:
const StaticData &staticData = StaticData::Instance();
    staticData.InitializeBeforeSentenceProcessing(source);

However,
In Manager.cpp
at Line 88:

const vector <DecodeGraph*>
&decodeStepVL = staticData.GetDecodeStepVL();
    m_transOptColl->CreateTranslationOptions(decodeStepVL);

*What are the DecodeGraph and the DecodeStep data structures used for?*

DecodeStep.cpp
Line 5

DecodeStep::DecodeStep(Dictionary *ptr, const DecodeStep* prev)
:m_ptr(ptr)
{
    FactorMask prevOutputFactors;
    if (prev) prevOutputFactors = prev->m_outputFactors;
    m_outputFactors = prevOutputFactors;
FactorMask conflictMask = (m_outputFactors & ptr->GetOutputFactorMask());
    m_outputFactors |= ptr->GetOutputFactorMask();
FactorMask newOutputFactorMask = m_outputFactors ^ prevOutputFactors; //xor
  m_newOutputFactors.resize(newOutputFactorMask.count());
    m_conflictFactors.resize(conflictMask.count());
    size_t j=0, k=0;
  for (size_t i = 0; i < MAX_NUM_FACTORS; i++) {
    if (newOutputFactorMask[i]) m_newOutputFactors[j++] = i;
        if (conflictMask[i]) m_conflictFactors[k++] = i;
    }
  VERBOSE(2,"DecodeStep():\n\toutputFactors=" << m_outputFactors
<< "\n\tconflictFactors=" << conflictMask
<< "\n\tnewOutputFactors=" << newOutputFactorMask << std::endl);
}*
*
*This seems to do some phrase table specific processing but I am not sure.* *Considering I dont need the phrase table and will provide my own TranslationListOption using my own method, what changes do I need to make to DecodeStep, if any?*

After this step I guess I can continue with the
SearchNormal::ProcessSentence() call that is made in Manager.cpp (Line 100) without changes to get the best hypothesis.

*To summarize,*
*In the decoding step: I would like to change the source of the translations options used for beam search to my own custom generated lists. *I will write some code to load the TranslationOptionCollection data structure to read lists from my custom source instead of the phrase table. I would like to continue using the LM etc being used earlier for the translation.

I would greatly appreciate any help/guidance that you may be able to provide.
Looking forward to hearing from you!

Thank You,
Regards

Danish Contractor
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to