Hi all
So, here is the answer. To extract the string from the Hypothesis
object, I used to use just this method:
Hypothesis::GetTargetPhraseStringRep();
for some reason, it seems to work when translating a string but not when
translating a CN or a lattice. I now use the following function
(inspired by what I found in mosesserver.cpp):
string moses_get_hyp(const Hypothesis* hypo) {
string current("");
Phrase p = hypo->GetCurrTargetPhrase();
for (size_t pos = 0 ; pos<p.GetSize() ; pos++) {
const Factor *factor = p.GetFactor(pos, 0);
current += factor->GetString()+string(" ");
}
const Hypothesis * prev = hypo->GetPrevHypo();
if(prev != NULL)
return moses_get_hyp(prev)+string(" ")+current;
return current;
}
I must confess that I don't really understand what I'm doing :( I'm just
copying code that works, and, well, that works.
cheers,
Sylvain
On 27/04/12 13:11, Sylvain Raybaud wrote:
> Hi Barrow
>
> By adding
> cerr << "[S2TT] GOT TRANSLATION: " << *hypo << endl;
>
> I was able to determine that the translation that are actually generated
> look reasonable. The problem therefore lays in how I extract it from the
> "hypo" object. I think I'll be able to find the problem. I'll let the
> list know.
>
> thanks for the help!
>
> cheers,
>
> Sylvain
>
> On 26/04/12 17:54, Barry Haddow wrote:
>> Hi Sylvain
>>
>> I think ProcessSentence() is the right method to call. If you look at moses
>> server then you'll see a less cluttered example of how to use the Moses api.
>> It may be your moses_get_hyp() is not back-tracking through the hypothesis
>> correctly.
>>
>> Note that you are calling UntransformScore() which probably explains your
>> odd
>> translation score. It doesn't make much sense to do this, as you won't get a
>> probability (it's not normalised). It is unusual though, that you appear to
>> have a positive translation score (in log space).
>>
>> If you increase the verbosity of moses (to 2 or 3) you'll get a better idea
>> what it is doing, and you can see whether it really is producing "of" as the
>> translation, and why.
>>
>> cheers - Barry
>>
>> On Thursday 26 April 2012 16:41:06 Sylvain Raybaud wrote:
>>> wild guessing here: in TranslationTask::Run, I see there are many
>>> alternatives for processing the sentence, like doLatticeMBR etc, not
>>> just runing Manager::ProcessSentence()
>>> Maybe one of these alternatives must be run for processing confusion
>>> networks?
>>>
>>> cheers
>>>
>>> Sylvain
>>>
>>> On 26/04/12 15:53, Sylvain Raybaud wrote:
>>>> Hi Barrow
>>>>
>>>> Thanks for the tip, that sounds likely indeed. I'll try it again but
>>>> last time I ran the software through valgrind, I got so many errors in
>>>> external libs that I just gave up.
>>>>
>>>> In the meantime, here is the complete fonction that handles the
>>>> decoding, in case someone sees something obviously wrong in here...
>>>>
>>>> static void moses_translate_phonemes(manager_data_t * pool,
>>>> translation_pair_t * pair) {
>>>> debug("starting");
>>>>
>>>> const TranslationSystem& system =
>>>> StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
>>>> /* there is only one translation system for now */
>>>> const StaticData &staticData = StaticData::Instance();
>>>> const vector<FactorType> &inputFactorOrder =
>>>> staticData.GetInputFactorOrder();
>>>>
>>>> MyConfusionNet * cn =
>>>> phonemes_to_cn(pool->mp_engine->phonemes_cm,pair->source->phonemes,pool->
>>>> mp_config->cn_width,pool->mp_config->cn_thresh,inputFactorOrder);
>>>>
>>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
>>>> &system);
>>>> manager->ProcessSentence();
>>>> const Hypothesis* hypo = manager->GetBestHypothesis();
>>>>
>>>> string hyp = moses_get_hyp(hypo);
>>>> char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
>>>> strcpy(hyp_ret,hyp.c_str());
>>>>
>>>> pair->translation_score = UntransformScore(hypo->GetScore());
>>>> translation_pair_set_target(pair, hyp_ret,NULL);
>>>>
>>>> delete manager;
>>>> delete cn;
>>>>
>>>> }
>>>>
>>>> cheers,
>>>>
>>>> Sylvain
>>>>
>>>> On 26/04/12 13:49, Barry Haddow wrote:
>>>>> Hi Sylvain
>>>>>
>>>>> I'm not familiar with this part of the code, but the strange score
>>>>> suggests that there's some uninitialised memory. You could try running
>>>>> through valgrind and it might give some clues,
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
>>>>>> Hi all
>>>>>>
>>>>>> I'm using Moses API for decoding a confusion network. The CN is
>>>>>> created from the output of an ASR engine and a confusion matrix. More
>>>>>> precisely (even though it's probably irrelevant to my problem), the ASR
>>>>>> engine provides a string of phonemes (1-best) and the confusion matrix
>>>>>> provides alternatives for each phonemes (the idea was described in
>>>>>> Jiang et al., _Phonetic representation based speech translation_, MT
>>>>>> Summit XIII, 2011).
>>>>>>
>>>>>> When the CN is dumped into a file and I use
>>>>>> moses -f moses.phonemes.cn.ini < CN
>>>>>> to decode it, everything is fine.
>>>>>>
>>>>>> But when I use Moses API (loading the same configuration file), I get
>>>>>> incomplete translations, like:
>>>>>>
>>>>>> ASR output (French): "nous font sont toujours chimistes plume
>>>>>> rassembleront ch je trouve que le office de ce tout de suite"
>>>>>> Phonetic representation: "n u f on s on t t u ge u r ch i m i s t z p l
>>>>>> y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
>>>>>> swa s swa t u d s h i t"
>>>>>> Translation: "of"
>>>>>> score: 903011968.000000
>>>>>>
>>>>>> Note that the transcription is poor (I haven't really tuned the ASR
>>>>>> engine), but still, the translation ought to be more than just "of".
>>>>>> Sometimes it's several words, I guess it's a phrase in the phrase
>>>>>> table. The word generally seems to be the translation of a word in the
>>>>>> source sentence.
>>>>>> When I use moses on command line to translate either the 1-best or the
>>>>>> the CN, I get a reasonable translation. When I use the API to translate
>>>>>> the 1-best phonetic representation, I also get a reasonable
>>>>>> translation. I think the CN object is created correctly because moses
>>>>>> loads it and prints it prior to decoding (this is normal verbose
>>>>>> behavior). I also tried to create a PCN object, and got exactly the
>>>>>> same results. So I guess the problem is either how I tell moses to
>>>>>> decode it or how I extract the result from the Hypothesis object. But
>>>>>> I'm clueless about what's the problem is here, since the code is
>>>>>> working when I just translate a string. The translation score seems
>>>>>> ridiculously high too. I'll give below the corresponding code.
>>>>>>
>>>>>> Decoding and hypothesis extraction:
>>>>>> ***********************************
>>>>>> [...]
>>>>>> Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
>>>>>> &system);
>>>>>> manager->ProcessSentence();
>>>>>> const Hypothesis* hypo = manager->GetBestHypothesis();
>>>>>> string hyp = moses_get_hyp(hypo);
>>>>>> [...]
>>>>>> pair->translation_score = UntransformScore(hypo->GetScore());
>>>>>> [...]
>>>>>>
>>>>>> string moses_get_hyp(const Hypothesis* hypo) {
>>>>>> return hypo->GetTargetPhraseStringRep();
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Creation of the CN:
>>>>>> *******************
>>>>>>
>>>>>> /** new class derived from ConfusionNet, with a new method for directly
>>>>>> creating CN */
>>>>>> class MyConfusionNet : public ConfusionNet {
>>>>>> public:
>>>>>> void addCol(Column);
>>>>>> };
>>>>>>
>>>>>> void MyConfusionNet::addCol(Column col) {
>>>>>> data.push_back(col);
>>>>>> }
>>>>>>
>>>>>> /** create a column of the CN */
>>>>>> static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
>>>>>> cm, const char * ph, int width, double thresh, const vector<FactorType>
>>>>>> &factor_order) {
>>>>>>
>>>>>> MyConfusionNet::Column col;
>>>>>>
>>>>>> phoneme_conf_t * ph_conf =
>>>>>> (phoneme_conf_t*)g_hash_table_lookup(cm->matrix,ph);
>>>>>> if(ph_conf==NULL) {
>>>>>> return col;
>>>>>> }
>>>>>>
>>>>>> int i;
>>>>>> for(i = 0; i<cm->n_phonemes; i++) {
>>>>>> vector<float> scores;
>>>>>> float score = float(ph_conf[i].p);
>>>>>> if((width<=0 || i<width) && (thresh<=0 || score>=thresh)) {
>>>>>> string wd(cm->phonemes[ph_conf[i].phoneme]);
>>>>>> Word word;
>>>>>> word.CreateFromString(Input,factor_order,wd,false);
>>>>>> scores.push_back(score);
>>>>>> pair<Word,vector<float> > linkdata(word,scores);
>>>>>> col.push_back(linkdata);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> return col;
>>>>>> }
>>>>>>
>>>>>> /** Creates a confusion network from a NULL terminated phonemes list
>>>>>> and a phonemes confusion matrix */
>>>>>> static MyConfusionNet * phonemes_to_cn(confusion_matrix_t * cm,const
>>>>>> char ** phonemes, int width, double thresh, const vector<FactorType>
>>>>>> &factor_order) {
>>>>>> debug("start");
>>>>>>
>>>>>> MyConfusionNet * cn = new MyConfusionNet();
>>>>>>
>>>>>> int i = 0;
>>>>>> while(phonemes[i]!=NULL) {
>>>>>> debug("%s",phonemes[i]);
>>>>>> MyConfusionNet::Column col =
>>>>>> create_phoneme_col(cm,phonemes[i],width,thresh,factor_order);
>>>>>> cn->addCol(col);
>>>>>> i += 1;
>>>>>> }
>>>>>>
>>>>>> return cn;
>>>>>> }
>>>>>>
>>>>>> So, if anyone has an idea about what's wrong here.... thanks!
>>>>>>
>>>>>> cheers,
>>>
>>
>> --
>> Barry Haddow
>> University of Edinburgh
>> +44 (0) 131 651 3173
>>
>
>
--
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support