Hey Hector,
Thanks again for all the comments!
I would also recommend the mentors to go through this email as it explores
an important part of Anaphora Resolution, i.e. Zero Anaphora Resolution.

Also, in a language like Spanish there are quite a lot of time
> constructions like
> El lunes irá al médico
> (word by word translation: Monday will-go to-the doctor)
> It is very likely that "lunes" will be chosen as the subject of "irá".
> (The same for dates e.g.: El 3 de abril irá al médico = The 3 of April
> will-go to-the doctor.)
>

I was thinking about cases like this and wondering if I can make an
animate-inanimate-human distinction for nouns or even just
animate-inanimate so that I can eliminate a lot of options easily. POS tags
and Gender info don't say much about this so still exploring this option,
but it could be incorporated in the impeding parameters, which is a list
that will be appended to.


> Assuming that a pronoun exists right before the verb is highly language
> specific. This works, as a rule, for SVO languages, like English, Spanish
> and Catalan, but will not work for SOV languages, like (typically) Turkic
> and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a.
> Arabic and Celtic languages. As we have quite a lot of non-SVO languages in
> Apertium, searching a subject right before the verb seems a bad guess.
>
> Furthermore, even for a SVO language like Spanish, there are several quite
> often verbs for whom the subject in located after the verb, e.g.:
> Me faltan libros
> Me gustan los plátanos
> Me duelen las muelas
> etc.
> Or in SVO languages like Russian or Esperanto, if is not rare to place the
> subject after the verb, since the case tells us what is the subject.
>
> So, I think the system should deal with different language typologies, and
> probably would need some configuration to deal with "special verbs" in a
> specific language, like "faltar", "gustar", "doler" given in the Spanish
> examples. Of course, you can try which are the results in the EU corpus
> with the system you propose, but it don't think there will be a good
> percentage of success in German, Finnish and Hungarian, and, I guess, they
> will be worse in Slavic languages than in Romance and, of course, English.
>

Hmmm, I see your point. I guess for zero pronouns it becomes more of a
Subject-Identification or a Semantic-Role-Labelling problem than an
Anaphora Resolution problem. It's true that the assuming zero anaphora
wouldn't work for non SVO languages, and it's funny I missed this
considering Hindi is my mother tongue.

I think it does make sense to make a system to deal with different language
typologies, at least for zero pronouns. It isn't technically Anaphora
Resolution so I'm a little confused about the scope, but the problem I've
to deal with is choosing the correct pronoun so no point in restricting to
traditional Anaphora Resolution.

I explored the old and new methods to do Zero Anaphora Resolution and as
expected, the older methods were heuristic based and the new methods are
corpus based.

"The same trend is observed also in Japanese zero- anaphora resolution,
where the findings made in rule-based or theory-oriented work (Kameyama,
1986; Nakaiwa and Shirai, 1996; Okumura and Tamura, 1996, etc.) have been
successfully incorporated in machine learning-based frame- works (Seki et
al., 2002; Iida et al., 2003)."
Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution
<http://delivery.acm.org/10.1145/1230000/1220254/p625-iida.pdf?ip=14.139.82.6&id=1220254&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554037262_d6ed4c6c6a58decfdf858189b9a4202c>

I explored the different methods that have been used, old and new. The
newer ones are mostly ML based methods which require a large amount of
data, however, even the older heuristic based methods require a large
amount of Linguistic Information, such as syntactic trees, semantic
knowledge, etc. Here are some examples:

1. An Empirical Study of Zero Anaphora Resolution in Chinese Based on
Centering Model <https://aclanthology.info/pdf/O/O01/O01-1011.pdf>

The Centering Model, a completely rule-based model and a very popular model
to perform Zero Anaphora Resolution, takes a syntax structure and its
semantic interpretation as input.

"The task of zero and nominal anaphora resolution is performed after the
semantic interpretation phase that converts the syntactic structure of a
sentence into a semantic representation form such as the logic form"

2.  Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution
<http://delivery.acm.org/10.1145/1230000/1220254/p625-iida.pdf?ip=14.139.82.6&id=1220254&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554037262_d6ed4c6c6a58decfdf858189b9a4202c>

Explores a hybrid mechanism but still uses ML to train models.

3. Zero Anaphora Resolution in Chinese with Shallow Parsing
<http://ken.myqnapcloud.com/ycchen/paper/JCLC_2007_V17_N1_04.pdf>

Is an interesting an inexpensive method, but uses Shallow Parsing. If
that's something that we can develop, this could be an interesting avenue
to explore.

4. Zero Pronoun Resolution with Attention-based Neural Network
<https://aclweb.org/anthology/C18-1002>
5. Discriminative Approach to Predicate-Argument Structure Analysis with
Zero-Anaphora Resolution
<http://delivery.acm.org/10.1145/1670000/1667611/p85-imamura.pdf?ip=14.139.82.6&id=1667611&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554039082_a3b9391fe40e6a19a75e523ee7b4caa4>
6. Japanese Zero Pronoun Resolution based on Ranking Rules and Machine
Learning <https://www.aclweb.org/anthology/W03-1024>

Modern methods, all based on Machine Learning.

---

So I guess what I'm trying to say is that it seems like zero pronoun
resolution seems like a problem in itself and seems more similar to other
problems like Semantic Role Labeling and it may not do the problem justice
to solve this under Anaphora Resolution. Hence I'll explore this further
and understand if it is possible. Maybe for now we could also focus on the
method of assuming a pronoun and resolving it to see results, at least for
SVO languages.

I will add this information to the proposal, if I decide to include/not
include Zero Anaphora Resolution in the project as I don't want to be too
ambitious either. I will also include analysis in the paper as to why the
method of Saliency Scores will be the best method given the current
situation.

Thanks a lot Hector for your comments as they helped me realise the
importance of a problem I was putting under the umbrella of Anaphora
Resolution.

Others who have comments or suggestions, they'll be really appreciated and
will help me build a good tool for Apertium and its users :)

Thanks and Regards,
Tanmai Khanna


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to