I am not sure if you have looked at that Matt Post et al paper about collecting Indian language data via crowdsourcing.
I believe the details and the corpora can be found here : http://joshua-decoder.org/data/indian-parallel-corpora/ On Thu, Dec 12, 2013 at 7:35 AM, Prasanth K <[email protected]>wrote: > Pranjal, > > Well, the co-relation between corpus size and BLEU score is complicated to > attribute them to the corpus size alone, but in this case, yes. That is the > reason why you get low scores. > > - Prasanth > > > On Thu, Dec 12, 2013 at 4:31 PM, Pranjal Das <[email protected]>wrote: > >> Thank you Prasanth....but why i am getting such a low bleu >> score...actually i have a very small corpus..about 2500 sentences..is it >> because of that ??? >> >> *Pranjal Das* >> Department of Information Technology, >> Institute of Science and Technology, >> Gauhati University,Guwahati,Assam >> Phone- +91-8399879454 >> >> >> On Thu, Dec 12, 2013 at 8:58 PM, Prasanth K <[email protected]>wrote: >> >>> Hi Pranjal, >>> >>> Its not uncommon to observe such differences when changing the direction >>> of translation. Translation from English to Bengali is relatively harder as >>> Bengali is morphologically rich, making it difficult for the correct >>> surface forms to be generated. Given that BLEU is a pattern matching >>> algorithm comparing surface forms, the drop in the score could be partly >>> attributed to not being able to generate the correct surface forms. >>> >>> You can look at the EuroMatrix, where similar patterns can be observed. >>> Translation from English->Finnish gives better results than the other way >>> around. >>> http://www.statmt.org/matrix/ >>> >>> Prasanth >>> >>> On Thu, Dec 12, 2013 at 4:21 PM, Pranjal Das <[email protected]>wrote: >>> >>>> Hi all.. >>>> While doing Bengali to English translation i got a bleu score of 7.02 >>>> and doing English to Bengali i got 4.7 >>>> >>>> why is the difference so high as i am using the same parallel corpus ?? >>>> >>>> >>>> *Pranjal Das* >>>> Department of Information Technology, >>>> Institute of Science and Technology, >>>> Gauhati University,Guwahati,Assam >>>> Phone- +91-8399879454 >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >>> >>> -- >>> "Theories have four stages of acceptance. i) this is worthless nonsense; >>> ii) this is an interesting, but perverse, point of view, iii) this is true, >>> but quite unimportant; iv) I always said so." >>> >>> --- J.B.S. Haldane >>> >> >> > > > -- > "Theories have four stages of acceptance. i) this is worthless nonsense; > ii) this is an interesting, but perverse, point of view, iii) this is true, > but quite unimportant; iv) I always said so." > > --- J.B.S. Haldane > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
