On 12 April 2010 12:44, Vineet Chaitanya <[email protected]> wrote: > > > On Mon, Apr 12, 2010 at 4:07 AM, Francis Tyers <[email protected]> wrote: >> >> El dg 11 de 04 de 2010 a les 11:41 +0530, en/na Vineet Chaitanya va >> escriure: >> > > xOdanA xOda vblex.inf.m.sg >> > > xOdane xOda vblex.inf.f.sg >> > > xOdanI xOda vblex.inf.m.pl >> > > xOdanIM xOda vblex.inf.f.pl >> > > >> > > There are some typos in above. >> > >> > >> > These are directly from the morphological analyser uploaded to >> > Apertium >> > SVN. >> > >> > Then it needs to be corrected. >> >> Ok, there are at least a couple of GSOC students who are applying to do >> something with Hindi, I've made "apertiumising" the tagset one of their >> first goals... I think there are only pronouns and verbs left to go. > > How about having programs for: > > 1. Automatic conversion of morphs from WX coding to Unicode coding
Are you really coming back to this? Yeesh. > 2. Mapping our tagsets to aperiumised tagsets > Why can't we just agree to disagree; we don't work your way, you don't work our way. It's not hard to find project hosting, it can't be that difficult to think up your own project name. Do it your way, release it your way, and maintain it your way. >> >> > Well, TAM dictionary does not deal with concordance across clauses. >> > It restricts it self to "serial verb construction" within a clause. >> > Idea is replace lot of rules by a large flat dictionary which any lay >> > man who knows both source and target languages can easily maintain. >> >> Ok, so at least for Apertium this isn't really necessary, but if you >> were to do it, I would put it all in a macro. > > > This is not clear to me. I would like to understand how Apertium's way of > handling TAMs is better than what I am trying to suggest. > >> >> > >> > > I believe lot of energy has been wasted on both sides because we do >> > > not know Catalan etc and you do not know Indic languages and each >> > > party kept on harping their own view points which the other party >> > > never cared to listen. >> > >> > >> > Yes, almost certainly. >> > >> > Do you know, why we do not like to use "Unicode" for grammatical >> > purpose, though we do use it in showing output at various stages? :-) >> > (Seriously, this may be ignored, right now) >> >> Yes, I know why, and that is ok for your purposes :) > > Let us keep this for future. > >> >> > >> > Yes, but I don't know why you would do it. e.g. is the treatment of "I >> > would have gone", "I should have gone", "I could have gone" that much >> > different ? >> > >> > >> > From English language point of view these look similar, but at >> > their Hindi translation: >> > >> > I would have gone : mEM jAtA >> > I should have gone: muJe jAnA cAhiye thA. (Please also note mEM-> >> > muJe) >> > I could have gone: mEM jA sakatA thA. >> >> Could you provide a morph output for these ? > > Ignore unnecessary fields in the following: > > mEM jAwA > --------- > ^mEM/mEM<cat:p><case:d><parsarg:0><gen:m><num:s><per:u> > ^jAwA/jA<cat:v><gen:m><num:s><per:u><tam:wA>/ > > muJe jAnA cAhiye WA > ------------------ > muJe/mEM<cat:p><case:o><parsarg:ko><gen:m><num:s><per:u> > jAnA/jA<cat:v><gen:m><num:s><per:u><tam:nA> > cAhiye/cAha<cat:v><gen:f><num:p><per:m_h2><tam:imper> > WA/WA<cat:v><gen:m><num:s><per:m><tam:WA> > > mEM jA sakawA WA > ------------------ > ^mEM/mEM<cat:p><case:d><parsarg:0><gen:m><num:s><per:u> > jA/jA<cat:v><gen:m><num:s><per:u><tam:0> > sakawA/saka<cat:v><gen:m><num:s><per:u><tam:wA> > WA/WA<cat:v><gen:m><num:s><per:m><tam:WA> > > >> >> > Moreover, there will be additional complications because of gnp. >> > All these problems can be by passed by simply having a relatively >> > large flat "dictionary" of TAMs, which as I said above can any >> > bilingual can easily handle. By the way, in India bilinguals or even >> > multilinguals are pretty common. >> >> > (A linguist may use his rules to generate this flat file if he >> > likes.) >> > A question: Do you have any local tests like our regression tests: >> > >> > http://wiki.apertium.org/wiki/Icelandic_and_English/Regression_tests >> > http://wiki.apertium.org/wiki/Breton_and_French/Regression_tests >> > >> > that we could look at -- for Hindi--English ? >> > >> > No. We simply keep a file of "verified sentences" of different >> > types, not systematically classified, which we simply run and check >> > before releasing a new version. Currently it has about 500 sentences. >> > We are working on Englsh->Hindi. Would you like to see it? >> >> Yes, this would be useful thanks :) > > I have attached the file. > > Regards > Vineet Chaitanya >> >> Fran >> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
