Hi Sylvain
I think ProcessSentence() is the right method to call. If you look at moses
server then you'll see a less cluttered example of how to use the Moses api.
It may be your moses_get_hyp() is not back-tracking through the hypothesis
correctly.
Note that you are calling UntransformScore() which probably explains your odd
translation score. It doesn't make much sense to do this, as you won't get a
probability (it's not normalised). It is unusual though, that you appear to
have a positive translation score (in log space).
If you increase the verbosity of moses (to 2 or 3) you'll get a better idea
what it is doing, and you can see whether it really is producing "of" as the
translation, and why.
cheers - Barry
On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote:
> wild guessing here: in TranslationTask::Run, I see there are many
> alternatives for processing the sentence, like doLatticeMBR etc, not
> just runing Manager::ProcessSentence()
> Maybe one of these alternatives must be run for processing confusion
> networks?
>
> cheers
>
> Sylvain
>
> On 26/04/12 15:53, Sylvain Raybaud wrote:
> > Hi Barrow
> >
> > Thanks for the tip, that sounds likely indeed. I'll try it again but
> > last time I ran the software through valgrind, I got so many errors in
> > external libs that I just gave up.
> >
> > In the meantime, here is the complete fonction that handles the
> > decoding, in case someone sees something obviously wrong in here...
> >
> > static void moses_translate_phonemes(manager_data_t * pool,
> > translation_pair_t * pair) {
> > debug("starting");
> >
> > const TranslationSystem& system =
> > StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
> > /* there is only one translation system for now */
> > const StaticData &staticData = StaticData::Instance();
> > const vector<FactorType> &inputFactorOrder =
> > staticData.GetInputFactorOrder();
> >
> > MyConfusionNet * cn =
> > phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool->
> >mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder);
> >
> > Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
> > &system);
> > manager->ProcessSentence();
> > const Hypothesis* hypo = manager->GetBestHypothesis();
> >
> > string hyp = moses_get_hyp(hypo);
> > char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
> > strcpy(hyp_ret,hyp.c_str());
> >
> > pair->translation_score = UntransformScore(hypo->GetScore());
> > translation_pair_set_target(pair, hyp_ret,NULL);
> >
> > delete manager;
> > delete cn;
> >
> > }
> >
> > cheers,
> >
> > Sylvain
> >
> > On 26/04/12 13:49, Barry Haddow wrote:
> >> Hi Sylvain
> >>
> >> I'm not familiar with this part of the code, but the strange score
> >> suggests that there's some uninitialised memory. You could try running
> >> through valgrind and it might give some clues,
> >>
> >> cheers - Barry
> >>
> >> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
> >>> Hi all
> >>>
> >>> I'm using Moses API for decoding a confusion network. The CN is
> >>> created from the output of an ASR engine and a confusion matrix. More
> >>> precisely (even though it's probably irrelevant to my problem), the ASR
> >>> engine provides a string of phonemes (1-best) and the confusion matrix
> >>> provides alternatives for each phonemes (the idea was described in
> >>> Jiang et al., _Phonetic representation based speech translation_, MT
> >>> Summit XIII, 2011).
> >>>
> >>> When the CN is dumped into a file and I use
> >>> moses -f moses.phonemes.cn.ini < CN
> >>> to decode it, everything is fine.
> >>>
> >>> But when I use Moses API (loading the same configuration file), I get
> >>> incomplete translations, like:
> >>>
> >>> ASR output (French): "nous font sont toujours chimistes plume
> >>> rassembleront ch je trouve que le office de ce tout de suite"
> >>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
> >>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
> >>> swa s swa t u d s h i t"
> >>> Translation: "of"
> >>> score: 903011968.000000
> >>>
> >>> Note that the transcription is poor (I haven't really tuned the ASR
> >>> engine), but still, the translation ought to be more than just "of".
> >>> Sometimes it's several words, I guess it's a phrase in the phrase
> >>> table. The word generally seems to be the translation of a word in the
> >>> source sentence.
> >>> When I use moses on command line to translate either the 1-best or the
> >>> the CN, I get a reasonable translation. When I use the API to translate
> >>> the 1-best phonetic representation, I also get a reasonable
> >>> translation. I think the CN object is created correctly because moses
> >>> loads it and prints it prior to decoding (this is normal verbose
> >>> behavior). I also tried to create a PCN object, and got exactly the
> >>> same results. So I guess the problem is either how I tell moses to
> >>> decode it or how I extract the result from the Hypothesis object. But
> >>> I'm clueless about what's the problem is here, since the code is
> >>> working when I just translate a string. The translation score seems
> >>> ridiculously high too. I'll give below the corresponding code.
> >>>
> >>> Decoding and hypothesis extraction:
> >>> ***********************************
> >>> [...]
> >>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
> >>> &system);
> >>> manager->ProcessSentence();
> >>> const Hypothesis* hypo = manager->GetBestHypothesis();
> >>> string hyp = moses_get_hyp(hypo);
> >>> [...]
> >>> pair->translation_score = UntransformScore(hypo->GetScore());
> >>> [...]
> >>>
> >>> string moses_get_hyp(const Hypothesis* hypo) {
> >>> return hypo->GetTargetPhraseStringRep();
> >>> }
> >>>
> >>>
> >>> Creation of the CN:
> >>> *******************
> >>>
> >>> /** new class derived from ConfusionNet, with a new method for directly
> >>> creating CN */
> >>> class MyConfusionNet : public ConfusionNet {
> >>> public:
> >>> void addCol(Column);
> >>> };
> >>>
> >>> void MyConfusionNet::addCol(Column col) {
> >>> data.push_back(col);
> >>> }
> >>>
> >>> /** create a column of the CN */
> >>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
> >>> cm, const char * ph, int width, double thresh, const vector<FactorType>
> >>> &factor_order) {
> >>>
> >>> MyConfusionNet::Column col;
> >>>
> >>> phoneme_conf_t * ph_conf =
> >>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
> >>> if(ph_conf==NULL) {
> >>> return col;
> >>> }
> >>>
> >>> int i;
> >>> for(i = 0; i<cm->n_phonemes; i++) {
> >>> vector<float> scores;
> >>> float score = float(ph_conf[i].p);
> >>> if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
> >>> string wd(cm->phonemes[ph_conf[i].phoneme]);
> >>> Word word;
> >>> word.CreateFromString(Input,factor_order,wd,false);
> >>> scores.push_back(score);
> >>> pair<Word,vector<float> > linkdata(word,scores);
> >>> col.push_back(linkdata);
> >>> }
> >>> }
> >>>
> >>> return col;
> >>> }
> >>>
> >>> /** Creates a confusion network from a NULL terminated phonemes list
> >>> and a phonemes confusion matrix */
> >>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
> >>> char ** phonemes, int width, double thresh, const vector<FactorType>
> >>> &factor_order) {
> >>> debug("start");
> >>>
> >>> MyConfusionNet * cn = new MyConfusionNet();
> >>>
> >>> int i = 0;
> >>> while(phonemes[i]!=NULL) {
> >>> debug("%s",phonemes[i]);
> >>> MyConfusionNet::Column col =
> >>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
> >>> cn->addCol(col);
> >>> i += 1;
> >>> }
> >>>
> >>> return cn;
> >>> }
> >>>
> >>> So, if anyone has an idea about what's wrong here.... thanks!
> >>>
> >>> cheers,
>
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support