Re: [Moses-support] please help me with the code - getting word index

Rico Sennrich Sat, 20 Jun 2015 06:46:55 -0700

Hi Amir,

There is currently no method that returns this, but BilingualLM(moses/LM/BilingualLM) calculates and uses the absolute source positionof each terminal - search for absolute_source_position.


best wishes,
Rico

On 20/06/15 14:35, amir haghighi wrote:

Thanks Matthias

ChartHypothesis::GetCurrSourceRange() gets the source span that allterminals and non terminals in the current hypothesis cover in thesource sentence. I'd like to know which terminals (non terminals) arecorresponded to which source word's index in the source. Could youguide me how to obtain that?


Thanks again

On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck <[email protected]<mailto:[email protected]>> wrote:


    Hi,

    You can calculate absolute positions in the source sentence based
    on the
    words range of the current hypothesis and those of the direct
    predecessors (in case of right-hand side non-terminals).

    Take a look at these methods:

            InputPath::GetWordsRange()
            ChartHypothesis::GetCurrSourceRange()
            ChartCellLabel::GetCoverage()

    Cheers,
    Matthias


    On Thu, 2015-06-18 at 20:23 +0430, amir haghighi wrote:
    > Hi everybody
    >
    >
    > I wrote the following code to get an ordered list from the
    source words
    > inside a hypothesis. It gets the words in their translation
    order, but I
    > need not only the words' strings, but also the index of each
    word in  the
    > original sentence.
    >
    > could you please help me how to get the index of each word in
    srcPhrase, in
    > the sentence?
    >
    >
    > void Amir::GetSourcePhrase2(const ChartHypothesis&  cur_hypo,Phrase
    > &srcPhrase) const
    > {
    >     AmirUtils utility;
    >     TargetPhrase targetPh=cur_hypo.GetCurrTargetPhrase();
    >     const Phrase *sourcePh=targetPh.GetRuleSource();
    >      int targetWordsNum=cur_hypo.GetCurrTargetPhrase().GetSize();
    >     std::vector <Word> source, orderedSource;
    >     std::vector <int> alignmentVector;
    >     std::vector <bool> isAligned;
    >
    >     std::vector <std::set <size_t> > sourcePosSets;
    >
    >     for(int targetP=0; targetP< targetWordsNum; targetP++ ){
    >         //std::cerr<<"setting alignments for targetword:
    "<<targetP<<endl;
    >
    >
    
sourcePosSets.push_back(cur_hypo.GetCurrTargetPhrase().GetAlignTerm().GetAlignmentsForTarget(targetP));
    >     }
    >
    >
    >     for(int ii=targetWordsNum-1; ii>=0; ii--){
    >         std::set <size_t> cur_srcPosSet=sourcePosSets[ii];
    >         for (std::set <size_t>::const_iterator alignmet =
    > cur_srcPosSet.begin();alignmet != cur_srcPosSet.end(); ++alignmet) {
    >             int  alignmentElement=*alignmet;
    >         for(int index=0; index<ii; index++ ){ //keep the
    rightmost one and
    > remove the othres
    >             //remove it from the list
    >  if(sourcePosSets[index].size()>0){
    >             //    std::cerr<<" removing "<<*alignmet<<endl;
    >                 //std::cerr<<"  for set with size:
    > "<<sourcePosSets[index].size()<<endl;
    >  sourcePosSets[index].erase(alignmentElement);
    >             }
    >
    >         }
    >     }
    >     }
    >
    > for (size_t posT = 0; posT <
    cur_hypo.GetCurrTargetPhrase().GetSize();
    > ++posT) {
    >   const Word &word = cur_hypo.GetCurrTargetPhrase().GetWord(posT);
    >   if (word.IsNonTerminal()){
    >     // non-term. fill out with prev hypo
    >
    >         size_t nonTermInd =
    >
    cur_hypo.GetCurrTargetPhrase().GetAlignNonTerm().GetNonTermIndexMap()[posT];
    >         const ChartHypothesis *prevHypo =
    cur_hypo.GetPrevHypo(nonTermInd);
    >
    >  GetSourcePhrase2(*prevHypo,srcPhrase);
    >     }
    >   else{
    >
    >  for(std::set<size_t>::const_iterator
    > it=sourcePosSets[posT].begin();it != sourcePosSets[posT].end() ;
    it++
    > ){
    >  srcPhrase.AddWord(sourcePh->GetWord(*it));
    >       }
    >       }
    > }
    >
    >
    > }
    > _______________________________________________
    > Moses-support mailing list
    > [email protected] <mailto:[email protected]>
    > http://mailman.mit.edu/mailman/listinfo/moses-support



    --
    The University of Edinburgh is a charitable body, registered in
    Scotland, with registration number SC005336.




_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] please help me with the code - getting word index

Reply via email to