Hi Khushi,

As for Hindi, you should first test the coverage.

According to this page (last edited in 2019), the dictionary was some
37,000 (which is quite good, in principle) but only some 83.1% :
https://wiki.apertium.org/wiki/Languages.
So, you should see what is the current state of the package.

You should install Apertium and the Hindi package. A corpus is need: we
usually get Wikipedia, and select randomly several million sentences of it.
With this, you can calculate the naive coverage, and see if the dictionary
has grown significatively since 2019.

Once you have this, you can analyse where the problem comes: This low
coverage is basically due to missing words or morphological forms that are
not recognised, although the words do exist in the dictionary? With 37,000
words and 83% coverage, the latter seems likely (regardless of the fact
that it is always good to have more words in dictionaries). It is a
question of understanding what is missing: nominal morphological forms,
verbals?

It is also interesting to see if there are free sources from which the
dictionary could be expanded automatically or semi-automatically.

On this basis one can see if there is work for a project. Most probably
there is for a small or, at most, a medium-sized one.

Hèctor


Missatge de Khushi - <12khushi...@gmail.com> del dia ds., 25 de febr. 2023
a les 10:02:

> Thanks a lot for your feedback!
> It would be great if you could tell me how should I get started with this
> and what milestones should I aim to achieve in order to improve it.
>
> Regards,
> Khushi Harsure
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>  Email
> delivery certified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>  25/02/23,
> 16:46:50
>
> On Fri, 24 Feb 2023 at 23:46, Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> Hi Kushi,
>>
>> First: Hindi-Marathi is already available on Google. I think you should
>> reason out the usefulness of developing it in Apertium. A priori, it does
>> not seem like a project that is going to be especially promising.
>>
>> As for your current question, why should the pair be created again from
>> scratch? Have you seen something wrong on it? In principle, I don't see at
>> all why the work that has been done before should be wasted. I would do it
>> only if, after analysing it, it turns out that it is appalling (which would
>> be weird).
>>
>> I don't know Hindi, but from what I saw two years ago, the morphological
>> analyser seems to have a lot of room for improvement. It might make sense
>> to concentrate on it and its morphological disambiguator. This would help
>> to subsequently develop translators between low-resource Indo-Aryan
>> languages and Hindi.
>>
>> Hèctor
>>
>> Missatge de Khushi - <12khushi...@gmail.com> del dia dv., 24 de febr.
>> 2023 a les 19:58:
>>
>>> Respected sir,
>>> Thanks a lot for your response. I am glad that you appreciate it. I
>>> wanted to clear up some doubts before I start working on it.
>>> I would like to know whether you want me to work on the existing marathi
>>> - hindi translator or should i create a new one from scratch. In the former
>>> case, what kind of improvements or contributions will be expected ?
>>> Looking forward to hearing from you soon !
>>>
>>> Regards,
>>> Khushi Harsure
>>>
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>  Email
>>> delivery certified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>  25/02/23,
>>> 02:48:57
>>>
>>> On Fri, 24 Feb 2023 at 20:05, Daniel Swanson <awesomeevildu...@gmail.com>
>>> wrote:
>>>
>>>> Hi Khushi,
>>>>
>>>> Yeah, that sounds like a good project to me.
>>>>
>>>> Next steps would be opening a pull request on
>>>> https://github.com/apertium/apertium-mar-hin and requesting a wiki
>>>> account to write your workplan.
>>>>
>>>> Daniel
>>>>
>>>> On Fri, Feb 24, 2023 at 4:09 AM Khushi - <12khushi...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> ---------- Forwarded message ---------
>>>>> From: Khushi - <12khushi...@gmail.com>
>>>>> Date: Fri, 24 Feb 2023 at 14:24
>>>>> Subject: Re : [Apertium-stuff] GSOC 2023
>>>>> To: <unham...@fsfe.org>
>>>>>
>>>>>
>>>>> Hello !
>>>>>
>>>>> This is Khushi Harsure, an undergraduate student from India pursuing
>>>>> Computer Science. I'd like to participate in Google Summer Of Code 2023 at
>>>>> Apertium. The project involving addition of a new language pair has  
>>>>> caught
>>>>> my interest and being a native speaker, I was planning to work on addition
>>>>> of Hindi-Marathi pair. Previously Hindi-English and English-Marathi pairs
>>>>> have been added by past Gsoccers however Hindi-Marathi pair remains
>>>>> unworked upon. Before starting off, I wanted to get a confirmation whether
>>>>> this would be a potential Gsoc project.
>>>>>                        I would also like to know the steps that should
>>>>> be followed after doing the installation of Apertium other than giving the
>>>>> coding challenge. Looking forward to hearing from you.
>>>>>
>>>>> Regards,
>>>>> Khushi Harsure
>>>>>
>>>>>
>>>>>
>>>>> [image: Mailtrack]
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>>>  Email
>>>>> delivery certified by
>>>>> Mailtrack
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>>>  24/02/23,
>>>>> 14:23:55
>>>>>
>>>>> [image: Mailtrack]
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>>>  Email
>>>>> delivery certified by
>>>>> Mailtrack
>>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality13&;>
>>>>>  24/02/23,
>>>>> 14:38:51
>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to