Thank you Hieu, The corpus is utf8, but there is a double space in this line. are double spaces regarded as a word? should I remove double spaces from the lines manually to get the correct sentence's length?
On Tue, Jan 21, 2014 at 4:12 AM, Hieu Hoang <[email protected]> wrote: > > On 20/01/2014 13:45, amir haghighi wrote: > > Hello > > I've some questions about the giza word alignment. > > 1-where is the final alignment file?Is it the aligned.1.grow.... in the > model folder? > > yes. > > > 2-do indexes of the words of both target and source sentences start from > 0? > > yes > > > 3- how does giza calculate the length of a sentence? > > the number of words > > I have a sentence with 11 tokens that are separated with space, but in > the alignment file it length is 13. > > strange. Are you sure your corpus file is encoded as UTF8? Are there > double spaces in the line? > > > Regards > Amir > > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
