hey danish
there's the XML input option which you can use to supply custom translation
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4
if you want to create your own translation options, a good place to look is
TranslationOptionCollection.cpp::ProcessOneUnknownWord() line 211
to see how trans. opt can be created without a phrase table.
The 1 tricky thing when creating your own trans. opt, is to make sure
that the phrases it uses aren't cleaned up while decoding still occurs,
and is garbage collected once decoding the sentence completes. This is
normally handled by the phrase table class. In unknown word processing,
the data used is put into m_unksrcs, which is cleaned up after decoding.
You have to do something similar.
the DecodeGraph and DecodeStep are used to create systems with multiple
phrase tables and generation tables. You don't need to touch these for now.
hieu
ps. what's your real name?
pps. please subscribe to the mailing list to post to it,
http://mailman.mit.edu/mailman/listinfo/moses-support
otherwise your mail will keep bouncing
On 10/01/2010 15:42, Danish Contractor wrote:
Hi!
I am trying to tweak the moses code (2009-04-13 version) to use a
supplied list of custom translation options instead of depending on
the phrase translation tables. i.e
For example
for the input "This is a house".
I would like to supply word level options for each token
This
is ....
..... and so on
TRANSLATION OPTION 1 TRANSLATION OPTION 1
TRANSLATION OPTION 2 TRANSLATION OPTION 2
TRANSLATION OPTION 3 TRANSLATION OPTION 3
ie for a token t*i* i would like to supply my own list of t*ij*.
I would like to continue using the language model scores etc.
I went through the paper on the design of the Moses decoder (/Design
of the Moses Decoder for Statistical Machine Translation)/ and the
code documentation available online but I need some help.
From my understanding the translation options for each token are
stored using the
TranslationOptionCollection.cpp data structure.
The Manager.cpp class instantiates the TranslationOptionCollection 2 D
array and initializes the Hypothesis stack and also initializes the
phrase tables,reorder models and language models.
In Manager.cpp
Line 59:
const StaticData &staticData = StaticData::Instance();
staticData.InitializeBeforeSentenceProcessing(source);
However,
In Manager.cpp
at Line 88:
const vector <DecodeGraph*>
&decodeStepVL = staticData.GetDecodeStepVL();
m_transOptColl->CreateTranslationOptions(decodeStepVL);
*What are the DecodeGraph and the DecodeStep data structures used for?*
DecodeStep.cpp
Line 5
DecodeStep::DecodeStep(Dictionary *ptr, const DecodeStep* prev)
:m_ptr(ptr)
{
FactorMask prevOutputFactors;
if (prev) prevOutputFactors = prev->m_outputFactors;
m_outputFactors = prevOutputFactors;
FactorMask conflictMask = (m_outputFactors &
ptr->GetOutputFactorMask());
m_outputFactors |= ptr->GetOutputFactorMask();
FactorMask newOutputFactorMask = m_outputFactors ^
prevOutputFactors; //xor
m_newOutputFactors.resize(newOutputFactorMask.count());
m_conflictFactors.resize(conflictMask.count());
size_t j=0, k=0;
for (size_t i = 0; i < MAX_NUM_FACTORS; i++) {
if (newOutputFactorMask[i]) m_newOutputFactors[j++] = i;
if (conflictMask[i]) m_conflictFactors[k++] = i;
}
VERBOSE(2,"DecodeStep():\n\toutputFactors=" << m_outputFactors
<< "\n\tconflictFactors=" << conflictMask
<< "\n\tnewOutputFactors=" << newOutputFactorMask << std::endl);
}*
*
*This seems to do some phrase table specific processing but I am not
sure.*
*Considering I dont need the phrase table and will provide my own
TranslationListOption using my own method, what changes do I need to
make to DecodeStep, if any?*
After this step I guess I can continue with the
SearchNormal::ProcessSentence() call that is made in Manager.cpp (Line
100) without changes to get the best hypothesis.
*To summarize,*
*In the decoding step: I would like to change the source of the
translations options used for beam search to my own custom generated
lists.
*I will write some code to load the TranslationOptionCollection data
structure to read lists from my custom source instead of the phrase table.
I would like to continue using the LM etc being used earlier for the
translation.
I would greatly appreciate any help/guidance that you may be able to
provide.
Looking forward to hearing from you!
Thank You,
Regards
Danish Contractor
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support