Re: [Apertium-stuff] Morphological analyser for Apertium

Gourab Chakraborty IIIT Dharwad Wed, 24 Mar 2021 23:45:53 -0700

Regarding the coverage of Apertium (English-Bengali pair), the naïve
coverage seems to be high enough (>70%) with precision above 99.6%,
negligible word error rate. So, one option can be to further increase the
coverage and another option could be to, as Hèctor suggested, create a new
language pair (Bengali-Hindi) , that would be ready for publication. I
personally would like to try out the second as creating a new pair for
Apertium seems more interesting. So finally which direction should I take
for GSoC? Also if I'll be proceeding with the second option, should I
create an initial PR, for apertium-hin-ben?


Thanks!

On Tue, Mar 23, 2021 at 10:04 AM Gourab Chakraborty IIIT Dharwad <
19bcs...@iiitdwd.ac.in> wrote:

> Thanks a lot Hèctor for the feedback. I will change my proposal to
> Creation of a language pair (Hindi-Bengali) that is ready for publication.
> Also I'm working on the corpus coverage of -ben as Daniel suggested. I'm
> focusing on apertium-ben for now, for the Hindi-Bengali language pair. Once
> again, thanks a lot for the feedback!
>
>
> On Tue, Mar 23, 2021 at 9:41 AM Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> Hi Gourab,
>>
>> There has been, long time ago, some work on Bengali:
>> Faridee AZM, Tyers FM (2009) Development of a morphological analyser for
>> Bengali. In: Pérez-Ortiz J, Sánchez-
>> Martínez F, Tyers F (eds) Proceedings of the First International Workshop
>> on Free/Open-Source Rule-Based Ma-
>> chine Translation, Universidad de Alicante. Departamento de Lenguajes y
>> Sistemas Informáticos, Alicante, Spain, pp 43–50.
>>
>> You should see how much it covers, as Daniel said. If the basis is done,
>> as I imagine, it would be more interesting to orient the proposal towards
>> the creation of a pair that is ready for publication. We have quite a few
>> parsers in different states of evolution, in particular for Indian
>> languages, but relatively few realised pairs. It would be very interesting
>> to have a "Bengali - another Indo-Iranian language" pair. Hindi-Bengali
>> would probably be the best option, as Hindi and Urdu are, to date, the only
>> languages that have been released in Apertium. Given that there is much
>> less time available in GSoC this year, one option would be to work mainly
>> in one direction. From Hindi to Bengali would be the easiest option because
>> it would also avoid having to work a lot on morphological disambiguation
>> (which should be more or less satisfactorily solved for Hindi). This would
>> make the project concentrate on 1) finishing the morphological analysis of
>> Bengali, 2) creating/expanding the transfer rules, 3) creating the lexical
>> selection rules, 4) adding several thousand words in the bidix, 5) testing
>> on real texts to fine-tune the translator and presenting a finished
>> translator with a WER of less than 25%, ready for publication, at the end
>> of the project. Least but not last, a Hindi-to-Bengali translator should
>> be, as a rule, easier for a Bengali-speaker than creating the opposite
>> direction.
>>
>> Hèctor
>>
>> Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia dt., 23
>> de març 2021 a les 0:11:
>>
>>> Hi Gourab,
>>>
>>> My recommendation would be to evaluate the current status -ben and
>>> -bn-en in terms of corpus coverage and WER and then incorporate into
>>> your proposal what those numbers are now and how much you think you
>>> can improve them.
>>>
>>> A pull request to one of the repositories involved would also be
>>> worthwhile, both in terms of your understanding of how to accomplish
>>> the tasks in your proposal and for the mentors to be able to evaluate
>>> your proposal.
>>>
>>> Daniel
>>>
>>> On Mon, Mar 22, 2021 at 3:06 PM Gourab Chakraborty IIIT Dharwad
>>> <19bcs...@iiitdwd.ac.in> wrote:
>>> >
>>> >
>>> > Hi,
>>> > I would like to participate in GSoC and am interested in contributing
>>> in improving the transfer system for apertium-bn-en. My work would fall in
>>> the "Develop a morphological analyser" category of the idea-list. I'm a
>>> native speaker of Bengali and am really excited for the project.
>>> >
>>> > I have gone through the official documentation, and have already setup
>>> apertium in my ubuntu system.
>>> >
>>> > I have prepared a draft for my GSoC proposal (
>>> https://docs.google.com/document/d/1S5EY6Eddu4v1ZMqgkM0Kjl_27kBhZkDkEz0Ddmnrotk/edit?usp=sharing).
>>> Since this is my first proposal for GSoC, I would really appreciate any
>>> feedback. Also what should I do next?
>>> >
>>> > Thank you
>>> > --
>>> > Gourab Chakraborty (IRC: gourab337)
>>> > _______________________________________________
>>> > Apertium-stuff mailing list
>>> > Apertium-stuff@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> Gourab Chakraborty
> 2nd year, CSE @ IIIT Dharwad
>


-- 
Gourab Chakraborty
2nd year, CSE @ IIIT Dharwad

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Morphological analyser for Apertium

Reply via email to