On Mon, Apr 12, 2010 at 5:29 PM, Jimmy O'Regan <[email protected]> wrote:

> On 12 April 2010 12:44, Vineet Chaitanya <[email protected]> wrote:
> .
> >>
> >> Ok, there are at least a couple of GSOC students who are applying to do
> >> something with Hindi, I've made "apertiumising" the tagset one of their
> >> first goals... I think there are only pronouns and verbs left to go.
> >
> >    How about having programs for:
> >
> >    1. Automatic conversion of morphs from WX coding to Unicode coding
>
> Are you really coming back to this? Yeesh.
>

   Because, there are a number morph-analysers in the WX form which probably
you would like to have in Unicode so why not to have a program for
conversion.
   We are not going to change-over to Unicode till they come up with an
"alphabetic version" of our scripts. I believe proposal for this is in
pipe-line.


> >    2. Mapping our tagsets to aperiumised tagsets
> >
>
>
> Why can't we just agree to disagree; we don't work your way, you don't
> work our way. It's not hard to find project hosting, it can't be that
> difficult to think up your own project name. Do it your way, release
> it your way, and maintain it your way.
>

   Of course that is our plan.
   I am assuming that inspite of these differences we can still fruitfully
collaborate.

Vineet Chaitanya

>
> >>
> >> >    Well, TAM dictionary does not deal with concordance across clauses.
> >> > It restricts it self to "serial verb construction" within a clause.
> >> > Idea is replace lot of rules by a large flat dictionary which any lay
> >> > man who knows both source and target languages can easily maintain.
> >>
> >> Ok, so at least for Apertium this isn't really necessary, but if you
> >> were to do it, I would put it all in a macro.
> >
> >
> >    This is not clear to me. I would like to understand how Apertium's way
> of
> > handling TAMs is better than what I am trying to suggest.
> >
> >>
> >> >
> >> > >  I believe lot of energy has been wasted on both sides because we do
> >> > > not know Catalan etc and you do not know Indic languages and each
> >> > > party kept on harping their own view points which the other party
> >> > > never cared to listen.
> >> >
> >> >
> >> > Yes, almost certainly.
> >> >
> >> >  Do you know, why we do not like to use "Unicode" for grammatical
> >> > purpose, though we do use it in showing output at various stages? :-)
> >> > (Seriously, this may be ignored, right now)
> >>
> >> Yes, I know why, and that is ok for your purposes :)
> >
> >     Let us keep this for future.
> >
> >>
> >> >
> >> > Yes, but I don't know why you would do it. e.g. is the treatment of "I
> >> > would have gone", "I should have gone", "I could have gone" that much
> >> > different ?
> >> >
> >> >
> >> >     From English language point of view these look similar, but at
> >> > their Hindi translation:
> >> >
> >> > I would have gone : mEM jAtA
> >> > I should have gone: muJe jAnA cAhiye thA. (Please also note mEM->
> >> > muJe)
> >> > I could have gone: mEM jA sakatA thA.
> >>
> >> Could you provide a morph output for these ?
> >
> >     Ignore unnecessary fields in the following:
> >
> > mEM jAwA
> > ---------
> > ^mEM/mEM<cat:p><case:d><parsarg:0><gen:m><num:s><per:u>
> > ^jAwA/jA<cat:v><gen:m><num:s><per:u><tam:wA>/
> >
> > muJe jAnA cAhiye WA
> > ------------------
> > muJe/mEM<cat:p><case:o><parsarg:ko><gen:m><num:s><per:u>
> > jAnA/jA<cat:v><gen:m><num:s><per:u><tam:nA>
> > cAhiye/cAha<cat:v><gen:f><num:p><per:m_h2><tam:imper>
> > WA/WA<cat:v><gen:m><num:s><per:m><tam:WA>
> >
> > mEM jA sakawA WA
> > ------------------
> > ^mEM/mEM<cat:p><case:d><parsarg:0><gen:m><num:s><per:u>
> > jA/jA<cat:v><gen:m><num:s><per:u><tam:0>
> > sakawA/saka<cat:v><gen:m><num:s><per:u><tam:wA>
> > WA/WA<cat:v><gen:m><num:s><per:m><tam:WA>
> >
> >
> >>
> >> >    Moreover, there will be additional complications because of gnp.
> >> > All these problems can be by passed by simply having a relatively
> >> > large flat "dictionary" of TAMs, which as I said above can any
> >> > bilingual can easily handle. By the way, in India bilinguals or even
> >> > multilinguals are pretty common.
> >>
> >> >  (A linguist may use his rules to generate this flat file if he
> >> > likes.)
> >> > A question: Do you have any local tests like our regression tests:
> >> >
> >> >  http://wiki.apertium.org/wiki/Icelandic_and_English/Regression_tests
> >> >  http://wiki.apertium.org/wiki/Breton_and_French/Regression_tests
> >> >
> >> > that we could look at -- for Hindi--English ?
> >> >
> >> >   No. We simply keep a file of "verified sentences" of different
> >> > types, not systematically  classified, which we simply run and check
> >> > before releasing a new version. Currently it has about 500 sentences.
> >> > We are working on Englsh->Hindi. Would you like to see it?
> >>
> >> Yes, this would be useful thanks :)
> >
> >    I have attached the file.
> >
> > Regards
> > Vineet Chaitanya
> >>
> >> Fran
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> > Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > Apertium-stuff mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
>
>
>
> --
> <Leftmost> jimregan, that's because deep inside you, you are evil.
> <Leftmost> Also not-so-deep inside you.
>
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to