I am not sure if you have looked at that Matt Post et al paper about
collecting Indian language data via crowdsourcing.

I believe the details and the corpora can be found here :

http://joshua-decoder.org/data/indian-parallel-corpora/






On Thu, Dec 12, 2013 at 7:35 AM, Prasanth K <[email protected]>wrote:

> Pranjal,
>
> Well, the co-relation between corpus size and BLEU score is complicated to
> attribute them to the corpus size alone, but in this case, yes. That is the
> reason why you get low scores.
>
> - Prasanth
>
>
> On Thu, Dec 12, 2013 at 4:31 PM, Pranjal Das <[email protected]>wrote:
>
>> Thank you Prasanth....but why i am getting such a low bleu
>> score...actually i have a very small corpus..about 2500 sentences..is it
>> because of that ???
>>
>> *Pranjal Das*
>> Department of Information Technology,
>> Institute of Science and Technology,
>> Gauhati University,Guwahati,Assam
>> Phone- +91-8399879454
>>
>>
>> On Thu, Dec 12, 2013 at 8:58 PM, Prasanth K <[email protected]>wrote:
>>
>>> Hi Pranjal,
>>>
>>> Its not uncommon to observe such differences when changing the direction
>>> of translation. Translation from English to Bengali is relatively harder as
>>> Bengali is morphologically rich, making it difficult for the correct
>>> surface forms to be generated. Given that BLEU is a pattern matching
>>> algorithm comparing surface forms, the drop in the score could be partly
>>> attributed to not being able to generate the correct surface forms.
>>>
>>> You can look at the EuroMatrix, where similar patterns can be observed.
>>> Translation from English->Finnish gives better results than the other way
>>> around.
>>> http://www.statmt.org/matrix/
>>>
>>> Prasanth
>>>
>>> On Thu, Dec 12, 2013 at 4:21 PM, Pranjal Das <[email protected]>wrote:
>>>
>>>> Hi all..
>>>> While doing Bengali to English translation i got a bleu score of 7.02
>>>> and doing English to Bengali i got 4.7
>>>>
>>>> why is the difference so high as i am using the same parallel corpus ??
>>>>
>>>>
>>>> *Pranjal Das*
>>>> Department of Information Technology,
>>>> Institute of Science and Technology,
>>>> Gauhati University,Guwahati,Assam
>>>> Phone- +91-8399879454
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>> "Theories have four stages of acceptance. i) this is worthless nonsense;
>>> ii) this is an interesting, but perverse, point of view, iii) this is true,
>>> but quite unimportant; iv) I always said so."
>>>
>>>   --- J.B.S. Haldane
>>>
>>
>>
>
>
> --
> "Theories have four stages of acceptance. i) this is worthless nonsense;
> ii) this is an interesting, but perverse, point of view, iii) this is true,
> but quite unimportant; iv) I always said so."
>
>   --- J.B.S. Haldane
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to