Hi Priyank,

Hindi-Punjabi seems to me a very nice pair for Apertium. It is usual that
closely related pairs give not very satisfactory results with Google,
because most of the time there is as an intermediate translation into
English. In any case, if you can give some data about the quality of the
Google translator (as I did in my 2019 GSoC application
<http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_proposal:_Catalan-Italian_and_Catalan-Portuguese#Current_situation_of_the_language_pairs>),
it may be useful, I think.

In order to present an application for a language-pair development it is
required to pass the so called "coding challenge"
<http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair#Coding_challenge>.
Basically, this will show that you understand the basis of the architecture
and knows how to add new words in the dictionaries.

For the project itself, you'll need to add many words to the Punjabi and
Punjabi-Hindi dictionaries, transfer rules and lexical selection rules. If
you intend to translate from Punjabi, you'll need to work on morphological
disambiguation, which needs at least a couple of weeks of work. This is
basic, since plenty of errors in Indo-European languages (and, I guess, not
only) come from bad morphological disambiguation. Usually, closed
categories are added first in the dictionaries and afterwards words are
mostly added using frequency lists. If there are free resources you may
use, this would be great, but it is absolutely necessary not to
automatically copy from copyrighted materials. For my own application this
year, I'm asking people to free their resources in order to be able to use
them.

You may be interested in previous applications for developing language
pairs, for instance this one
<http://wiki.apertium.org/wiki/Grfro3d/proposal_apertium_cat-srd_and_ita-srd>,
in addition to mine last year.

Best wishes,
Hèctor


Missatge de Priyank Modi <priyankmod...@gmail.com> del dia dv., 6 de març
2020 a les 23:49:

> Hi,
> I am trying to work towards developing the Hindi-Punjabi pair and needed
> some guidance on how to go about it. I ran the test files and could notice
> that the dictionary file for Punjabi needs work(even a lot of function
> words could not be found by the translator). Should I start with that? Are
> there some tests each stage needs to pass? Also, finally what sort of work
> is expected to make a decent GSOC proposal, of course I'll be interested in
> developing this pair regardless since even Google translate doesn't seem to
> work well for this pair(for the test set specifically the apertium
> translator worked significantly better)
> Any help would be appreciated.
>
> Thanks.
>
> Warm regards,
> PM
>
> --
> Priyank Modi       ●  Undergrad Research Student
> IIIT-Hyderabad        ●  Language Technologies Research Center
> Mobile:  +91 83281 45692
> Website <https://priyankmodipm.github.io/>    ●    Linkedin
> <https://www.linkedin.com/in/priyank-modi-81584b175/>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to