Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

Hèctor Alòs i Font Fri, 29 Mar 2019 22:20:05 -0700

Hi Tanmai,

I add some comments between paragraphs (especially on zero pronouns).


Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 30 de març
2019 a les 1:11:

> Hi Hector,
> Thanks for all your comments. I really appreciate it! :) I'll try to
> respond to the best of my abilities:
>
> When I claimed "The girl ate his apple" is grammatically incoherent, I
> meant in the case that this is all of the discourse. You're right that a
> pronoun could refer to something in the real world which isn't present in
> discourse, but that kind of anaphora resolution is impossible if you have
> just text so usually, we just ignore it.
>
> Before I start answering the question, I also want to point out that this
> is an endeavour to build a tool that otherwise uses a lot more
> linguistically complex knowledge, without that knowledge and to make it
> good enough with the available simple linguistic features available. Some
> parts of what can be done or can't be done will be found out experimentally
> but I added them in my proposal so that we can try and make an informed
> decision as to whether something can be language independent or not.
>
> 1. Following this thought, let's talk about marking verbs with
> antecedents. For dealing with zero pronouns, we *have* *to *mark the
> verbs with the antecedents and hence it is something that will be a part of
> this tool.
>
> You're right in saying that it will be hard to capture the subject of a
> verb without any configuration. However, that wasn't what I was trying to
> do. *I decided to treat zero pronouns as literally zero pronouns.* Assume
> a pronoun exists right before the verb and then perform anaphora resolution
> on this zero pronoun. This tool will be language agnostic. If the results
> are unsatisfactory, we can funnel down and create language-specific
> features to identify the subject :)
>

Assuming that a pronoun exists right before the verb is highly language
specific. This works, as a rule, for SVO languages, like English, Spanish
and Catalan, but will not work for SOV languages, like (typically) Turkic
and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a.
Arabic and Celtic languages. As we have quite a lot of non-SVO languages in
Apertium, searching a subject right before the verb seems a bad guess.

Furthermore, even for a SVO language like Spanish, there are several quite
often verbs for whom the subject in located after the verb, e.g.:
Me faltan libros
Me gustan los plátanos
Me duelen las muelas
etc.
Or in SVO languages like Russian or Esperanto, if is not rare to place the
subject after the verb, since the case tells us what is the subject.

Also, in a language like Spanish there are quite a lot of time
constructions like
El lunes irá al médico
(word by word translation: Monday will-go to-the doctor)
It is very likely that "lunes" will be chosen as the subject of "irá".
(The same for dates e.g.: El 3 de abril irá al médico = The 3 of April
will-go to-the doctor.)

So, I think the system should deal with different language typologies, and
probably would need some configuration to deal with "special verbs" in a
specific language, like "faltar", "gustar", "doler" given in the Spanish
examples. Of course, you can try which are the results in the EU corpus
with the system you propose, but it don't think there will be a good
percentage of success in German, Finnish and Hungarian, and, I guess, they
will be worse in Slavic languages than in Romance and, of course, English.

2. Identifying antecedents of adjectives (so to speak) will require
> separate metrics, but these examples are exactly along the lines of what
> I've been thinking, i.e. detecting relative clauses and moving them out of
> the way to let the adjective recognise its antecedent. It probably
> recognises that for "The lady with the book" because "the book" is part
> of a PP which cannot be the subject of "is", similarly I will try to create
> relative clause detection to ignore that and connect nice to the nice lady.
>
> 3. So "tall" would get the correct adjective if we could do anaphora
> resolution for first and second person pronouns but that becomes a lot more
> complex than third person pronouns. Correct me if I'm wrong, but first and
> second person pronouns are usually resolved in the real world, and not very
> often said first in context. If you ask me I would leave those out for now.
> But you're right, it is interesting to think about how to deal with them.
> Maybe in cases where the person introduces themselves first, we should be
> able to attach it to "I" in "I am".
>

Yes, the problem is the one you say. It is generally impossible in, for
example, an English text to know whether "I" or "you" are male of female,
or "we" is inclusive or exclusive, etc. That's why I thing it's better to
forget about 1st and 2nd persons (imho).


> 4. I was told that Anaphora is needed in Catalan as well, and if we use
> the same module for both we still have to test how it performs on both. But
> as mentioned in the proposal, I'll try to make the anaphora tool as
> language agnostic as possible and will test it with multiple pairs to see
> the result. If you have any pair suggestions right now that need it I can
> add them.
>

Yes, of course, for the English-Catalan pair the anaphora resolution will
be very useful. I'm simply saying that it does not make much sense to test
the system with such twin pairs like English-Spanish and English-Catalan.
Any of them is fine. Better use a second quite different pair for testing.
But this will probably need someone who will have time to make this tests
during the GSoC time. I'd probably (and hopefully) busy, so I won't be able
myself.


> 5. I'm using Apertium Simpleton UI for MacOS and for "La chica está aquí,
> lleva un vestido rojo.", I get "The girl is here, spends a red dress"
> (Attaching Screenshot makes email too big to send so just take my word for
> it :P ). Not sure why
>

No problem. I made the test in the web :)

Best,
Hèctor


> Thanks for all your questions and suggestions, they'll definitely help me
> build a better tool. I really hope I was able to answer your questions
> satisfactorily. If not, I apologise and I wouldn't mind a follow up. It
> will certainly help me even more. :)
>
> On Sat, Mar 30, 2019 at 12:54 AM Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> Hi Tanmai,
>>
>> I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
>> if I am allowed, I'd like some clarification about the proposal (which, I
>> think, is great - congrats).
>>
>> First of all, note that "The girl ate his apple" is not grammatically
>> incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
>> resolution is complicated i.a. because language is often ambiguous.
>>
>> 1. I've been thinking about the example
>>
>> La chica comió su manzana
>>
>> Let's suppose that the antecedent of "su" is "la chica".
>> If the target language would be a Slavic language or Esperanto, the
>> selection will not be between "his", "her" or "its", but also a reflexive
>> possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
>> not Девушка съела её яблоко. If using the proposal in
>> http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could
>> we deal with it. We probably should need to have a referent in the verb
>> too, in order to be able to compare in the transfer rules whether the
>> antecedent of "su" is also the antecedent of "comió".
>>
>> So, my point is: will the user be able to "configure" for which parts of
>> speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
>> I don't see any need to track the "antecedents" of verbs, but for e.g.
>> Spanish to English it seems necessary for dealing with zero pronouns.
>>
>> (By the way, I am surprised that e.g. the subject of a verb can be
>> tracked by a language-independent tool without any configuration. I really
>> doubt this can be true.)
>>
>> 2. The examples in "Reflexive pronouns" and "Long distance agreement"
>> seem very difficult. I'd propose a few simpler agreements:
>> * The lady with the book is nice.
>> * The lady reading the book is nice.
>> * The lady who reads the book is nice.
>> "Nice" should be feminine in Spanish/Catalan (currently it happens only
>> in the first case)
>> * The singers that sing sing well.
>> Both "sing" should be p3pl in Spanish/Catalan, currently they are not
>> ("Los cantantes que canta canta bien").
>>
>> 3. Let's accept that we will deal only with the 3rd person. It is too
>> complicated to resolve:
>> * I'm tall
>> (gender?)
>> * You are tall
>> (gender? number?)
>>
>> 4. I cannot see why it should be useful to test the system with the
>> Spanish-English and Catalan-English pairs. As for the anaphora, if I am not
>> wrong, Catalan and Spanish are twins. One pair of the two seems enough.
>>
>> 5. One detail: the current translation of
>> La chica está aquí, lleva un vestido rojo.
>> is:
>> The girl is here, carries a red dress.
>>
>> Best,
>> Hèctor
>>
>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dv., 29 de
>> març 2019 a les 15:48:
>>
>>> Hi,
>>> I have submitted a draft for review for the project "Anaphora
>>> Resolution" for GSoC 2019. The project will also include a tool for
>>> resolution of agreement for adjectives in Spanish, Catalan and other
>>> languages that need it.
>>>
>>> You can find the proposal here:
>>> http://wiki.apertium.org/wiki/User:Khannatanmai
>>>
>>> If anyone has any comments, suggestions, criticism, ideas, I would
>>> really appreciate if you let me know as it'll help me make a stronger
>>> proposal and a better tool for Apertium during GSoC 2019.
>>>
>>> Thanks and Regards,
>>> Tanmai Khanna
>>> IRC: khannatanmai
>>>
>>> --
>>> *Khanna, Tanmai*
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> *Khanna, Tanmai*
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

Reply via email to