Re: [Apertium-stuff] GSoC proposal draft: Developing a Morphological Analyzer for Torwali Language

Hèctor Alòs i Font Wed, 07 Apr 2021 21:41:17 -0700

Hi Naeem,

Thanks a lot for your very good and interesting draft application. Torwali
is an excellent language for Apertium. You know the challenges it presents
and the work on it, and you prove to be committed to the language and the
project. I am not a specialist on lexc-twol, but I see a few general things
to improve your application:


* The coding challenge is very important. It proves you understand how
Apertium works (not only theoretically) and that you can do the job. So, do
it as well as you can now. Don't leave it until after the application
period.

* Your 30 hours commitment per week is to be welcome, but bear in mind that
it is much more than what Google is asking for this year.

* You want to enter 50,000+ words in the morphological analyser. That's a
huge amount. But in your work plan you don't say when you are going to do
it. It would be necessary to show how many words and which grammatical
categories you would add in each time slot (two weeks in your case).
Usually we start with the closed categories. When you detail these numbers
in your proposal, we will see how many words you will be able to reach.

* I have no idea how it is in the case of Dardic languages, but the
assignment of words to categories is not usually trivial in Indo-European
languages. Do existing works already have lists of words assigned to
paradigms? For example: lists of verbs following one model or another. If
not, the time needed for assignment increases. It is necessary to know this
in order to calculate the feasibility of introducing 50,000, 30,000 or
20,000 words.

* Are there extensive lists of words available in electronic format, with
their grammatical category, which you could use for your work? They should
be free. If they were copyrighted they could not be (semi-)automatically
uploaded to Apertium.

* It is very likely that, with the very limited time we have this year for
GSoC projects, a complete morphological analyser from scratch is perfectly
reasonable. Still, before putting so many words into it (especially if you
have to add them manually), I think it would be reasonable to spend a
couple of weeks training a morphological disambiguator.

Hèctor

Missatge de Naeemuddin Hadi <naeemuddinh...@gmail.com> del dia dj., 8
d’abr. 2021 a les 1:46:

> Hello everyone,
>
> I am Naeem, a student of UET Peshawar. I want to participate in GSoC
> 2021.  I am working to create a morphological analyzer for an endangered
> language of northern Pakistan called Torwali.
> I have prepared a draft proposal and will appreciate feedbacks before
> final submission. links related to coding challenge are included in the
> draft.
>
> link (Draft) :
> https://drive.google.com/file/d/1hnu6gRWVN3LjjxOj0BvimvJ56AIKfe6q/view?usp=sharing
>
>
> Regards,
> Naeem
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC proposal draft: Developing a Morphological Analyzer for Torwali Language

Reply via email to