Hi Priyank, Hindi-Punjabi seems to me a very nice pair for Apertium. It is usual that closely related pairs give not very satisfactory results with Google, because most of the time there is as an intermediate translation into English. In any case, if you can give some data about the quality of the Google translator (as I did in my 2019 GSoC application <http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_proposal:_Catalan-Italian_and_Catalan-Portuguese#Current_situation_of_the_language_pairs>), it may be useful, I think.
In order to present an application for a language-pair development it is required to pass the so called "coding challenge" <http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair#Coding_challenge>. Basically, this will show that you understand the basis of the architecture and knows how to add new words in the dictionaries. For the project itself, you'll need to add many words to the Punjabi and Punjabi-Hindi dictionaries, transfer rules and lexical selection rules. If you intend to translate from Punjabi, you'll need to work on morphological disambiguation, which needs at least a couple of weeks of work. This is basic, since plenty of errors in Indo-European languages (and, I guess, not only) come from bad morphological disambiguation. Usually, closed categories are added first in the dictionaries and afterwards words are mostly added using frequency lists. If there are free resources you may use, this would be great, but it is absolutely necessary not to automatically copy from copyrighted materials. For my own application this year, I'm asking people to free their resources in order to be able to use them. You may be interested in previous applications for developing language pairs, for instance this one <http://wiki.apertium.org/wiki/Grfro3d/proposal_apertium_cat-srd_and_ita-srd>, in addition to mine last year. Best wishes, Hèctor Missatge de Priyank Modi <priyankmod...@gmail.com> del dia dv., 6 de març 2020 a les 23:49: > Hi, > I am trying to work towards developing the Hindi-Punjabi pair and needed > some guidance on how to go about it. I ran the test files and could notice > that the dictionary file for Punjabi needs work(even a lot of function > words could not be found by the translator). Should I start with that? Are > there some tests each stage needs to pass? Also, finally what sort of work > is expected to make a decent GSOC proposal, of course I'll be interested in > developing this pair regardless since even Google translate doesn't seem to > work well for this pair(for the test set specifically the apertium > translator worked significantly better) > Any help would be appreciated. > > Thanks. > > Warm regards, > PM > > -- > Priyank Modi ● Undergrad Research Student > IIIT-Hyderabad ● Language Technologies Research Center > Mobile: +91 83281 45692 > Website <https://priyankmodipm.github.io/> ● Linkedin > <https://www.linkedin.com/in/priyank-modi-81584b175/> > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff