wild guessing here: in TranslationTask::Run, I see there are many
alternatives for processing the sentence, like doLatticeMBR etc, not
just runing Manager::ProcessSentence()
Maybe one of these alternatives must be run for processing confusion
networks?
cheers
Sylvain
On 26/04/12 15:53, Sylvain Raybaud wrote:
> Hi Barrow
>
> Thanks for the tip, that sounds likely indeed. I'll try it again but
> last time I ran the software through valgrind, I got so many errors in
> external libs that I just gave up.
>
> In the meantime, here is the complete fonction that handles the
> decoding, in case someone sees something obviously wrong in here...
>
> static void moses_translate_phonemes(manager_data_t * pool,
> translation_pair_t * pair) {
> debug("starting");
>
> const TranslationSystem& system =
> StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
> /* there is only one translation system for now */
> const StaticData &staticData = StaticData::Instance();
> const vector<FactorType> &inputFactorOrder =
> staticData.GetInputFactorOrder();
>
> MyConfusionNet * cn =
> phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool->mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder);
>
> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
> &system);
> manager->ProcessSentence();
> const Hypothesis* hypo = manager->GetBestHypothesis();
>
> string hyp = moses_get_hyp(hypo);
> char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
> strcpy(hyp_ret,hyp.c_str());
>
> pair->translation_score = UntransformScore(hypo->GetScore());
> translation_pair_set_target(pair, hyp_ret,NULL);
>
> delete manager;
> delete cn;
>
> }
>
> cheers,
>
> Sylvain
>
> On 26/04/12 13:49, Barry Haddow wrote:
>> Hi Sylvain
>>
>> I'm not familiar with this part of the code, but the strange score suggests
>> that there's some uninitialised memory. You could try running through
>> valgrind
>> and it might give some clues,
>>
>> cheers - Barry
>>
>> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
>>> Hi all
>>>
>>> I'm using Moses API for decoding a confusion network. The CN is
>>> created from the output of an ASR engine and a confusion matrix. More
>>> precisely (even though it's probably irrelevant to my problem), the ASR
>>> engine provides a string of phonemes (1-best) and the confusion matrix
>>> provides alternatives for each phonemes (the idea was described in Jiang
>>> et al., _Phonetic representation based speech translation_, MT Summit
>>> XIII, 2011).
>>>
>>> When the CN is dumped into a file and I use
>>> moses -f moses.phonemes.cn.ini < CN
>>> to decode it, everything is fine.
>>>
>>> But when I use Moses API (loading the same configuration file), I get
>>> incomplete translations, like:
>>>
>>> ASR output (French): "nous font sont toujours chimistes plume
>>> rassembleront ch je trouve que le office de ce tout de suite"
>>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
>>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
>>> swa s swa t u d s h i t"
>>> Translation: "of"
>>> score: 903011968.000000
>>>
>>> Note that the transcription is poor (I haven't really tuned the ASR
>>> engine), but still, the translation ought to be more than just "of".
>>> Sometimes it's several words, I guess it's a phrase in the phrase table.
>>> The word generally seems to be the translation of a word in the source
>>> sentence.
>>> When I use moses on command line to translate either the 1-best or the
>>> the CN, I get a reasonable translation. When I use the API to translate
>>> the 1-best phonetic representation, I also get a reasonable translation.
>>> I think the CN object is created correctly because moses loads it and
>>> prints it prior to decoding (this is normal verbose behavior). I also
>>> tried to create a PCN object, and got exactly the same results. So I
>>> guess the problem is either how I tell moses to decode it or how I
>>> extract the result from the Hypothesis object. But I'm clueless about
>>> what's the problem is here, since the code is working when I just
>>> translate a string. The translation score seems ridiculously high too.
>>> I'll give below the corresponding code.
>>>
>>> Decoding and hypothesis extraction:
>>> ***********************************
>>> [...]
>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
>>> &system);
>>> manager->ProcessSentence();
>>> const Hypothesis* hypo = manager->GetBestHypothesis();
>>> string hyp = moses_get_hyp(hypo);
>>> [...]
>>> pair->translation_score = UntransformScore(hypo->GetScore());
>>> [...]
>>>
>>> string moses_get_hyp(const Hypothesis* hypo) {
>>> return hypo->GetTargetPhraseStringRep();
>>> }
>>>
>>>
>>> Creation of the CN:
>>> *******************
>>>
>>> /** new class derived from ConfusionNet, with a new method for directly
>>> creating CN */
>>> class MyConfusionNet : public ConfusionNet {
>>> public:
>>> void addCol(Column);
>>> };
>>>
>>> void MyConfusionNet::addCol(Column col) {
>>> data.push_back(col);
>>> }
>>>
>>> /** create a column of the CN */
>>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
>>> cm, const char * ph, int width, double thresh, const vector<FactorType>
>>> &factor_order) {
>>>
>>> MyConfusionNet::Column col;
>>>
>>> phoneme_conf_t * ph_conf =
>>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
>>> if(ph_conf==NULL) {
>>> return col;
>>> }
>>>
>>> int i;
>>> for(i = 0; i<cm->n_phonemes; i++) {
>>> vector<float> scores;
>>> float score = float(ph_conf[i].p);
>>> if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
>>> string wd(cm->phonemes[ph_conf[i].phoneme]);
>>> Word word;
>>> word.CreateFromString(Input,factor_order,wd,false);
>>> scores.push_back(score);
>>> pair<Word,vector<float> > linkdata(word,scores);
>>> col.push_back(linkdata);
>>> }
>>> }
>>>
>>> return col;
>>> }
>>>
>>> /** Creates a confusion network from a NULL terminated phonemes list and
>>> a phonemes confusion matrix */
>>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
>>> char ** phonemes, int width, double thresh, const vector<FactorType>
>>> &factor_order) {
>>> debug("start");
>>>
>>> MyConfusionNet * cn = new MyConfusionNet();
>>>
>>> int i = 0;
>>> while(phonemes[i]!=NULL) {
>>> debug("%s",phonemes[i]);
>>> MyConfusionNet::Column col =
>>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
>>> cn->addCol(col);
>>> i += 1;
>>> }
>>>
>>> return cn;
>>> }
>>>
>>> So, if anyone has an idea about what's wrong here.... thanks!
>>>
>>> cheers,
>>>
>
>
--
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support