Hi!
I am trying to tweak the moses code (2009-04-13 version) to use a supplied
list of custom translation options instead of depending on the phrase
translation tables. i.e
For example
for the input "This is a house".
I would like to supply word level options for each token
This
is .... .....
and so on
TRANSLATION OPTION 1 TRANSLATION OPTION 1
TRANSLATION OPTION 2 TRANSLATION OPTION 2
TRANSLATION OPTION 3 TRANSLATION OPTION 3
ie for a token t*i* i would like to supply my own list of t*ij*.
I would like to continue using the language model scores etc.
I went through the paper on the design of the Moses decoder (*Design of the
Moses Decoder for Statistical Machine Translation)* and the code
documentation available online but I need some help.
>From my understanding the translation options for each token are stored
using the
TranslationOptionCollection.cpp data structure.
The Manager.cpp class instantiates the TranslationOptionCollection 2 D array
and initializes the Hypothesis stack and also initializes the phrase
tables,reorder models and language models.
In Manager.cpp
Line 59:
const StaticData &staticData = StaticData::Instance();
staticData.InitializeBeforeSentenceProcessing(source);
However,
In Manager.cpp
at Line 88:
const vector <DecodeGraph*>
&decodeStepVL = staticData.GetDecodeStepVL();
m_transOptColl->CreateTranslationOptions(decodeStepVL);
*What are the DecodeGraph and the DecodeStep data structures used for?*
DecodeStep.cpp
Line 5
DecodeStep::DecodeStep(Dictionary *ptr, const DecodeStep* prev)
:m_ptr(ptr)
{
FactorMask prevOutputFactors;
if (prev) prevOutputFactors = prev->m_outputFactors;
m_outputFactors = prevOutputFactors;
FactorMask conflictMask = (m_outputFactors &
ptr->GetOutputFactorMask());
m_outputFactors |= ptr->GetOutputFactorMask();
FactorMask newOutputFactorMask = m_outputFactors ^ prevOutputFactors;
//xor
m_newOutputFactors.resize(newOutputFactorMask.count());
m_conflictFactors.resize(conflictMask.count());
size_t j=0, k=0;
for (size_t i = 0; i < MAX_NUM_FACTORS; i++) {
if (newOutputFactorMask[i]) m_newOutputFactors[j++] = i;
if (conflictMask[i]) m_conflictFactors[k++] = i;
}
VERBOSE(2,"DecodeStep():\n\toutputFactors=" << m_outputFactors
<< "\n\tconflictFactors=" << conflictMask
<< "\n\tnewOutputFactors=" << newOutputFactorMask << std::endl);
}*
*
*This seems to do some phrase table specific processing but I am not sure.*
*Considering I dont need the phrase table and will provide my own
TranslationListOption using my own method, what changes do I need to make to
DecodeStep, if any?*
After this step I guess I can continue with the
SearchNormal::ProcessSentence() call that is made in Manager.cpp (Line 100)
without changes to get the best hypothesis.
*To summarize,*
*In the decoding step: I would like to change the source of the translations
options used for beam search to my own custom generated lists.
*I will write some code to load the TranslationOptionCollection data
structure to read lists from my custom source instead of the phrase table.
I would like to continue using the LM etc being used earlier for the
translation.
I would greatly appreciate any help/guidance that you may be able to
provide.
Looking forward to hearing from you!
Thank You,
Regards
Danish Contractor
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support