Hey Hector, Thanks again for all the comments! I would also recommend the mentors to go through this email as it explores an important part of Anaphora Resolution, i.e. Zero Anaphora Resolution.
Also, in a language like Spanish there are quite a lot of time > constructions like > El lunes irá al médico > (word by word translation: Monday will-go to-the doctor) > It is very likely that "lunes" will be chosen as the subject of "irá". > (The same for dates e.g.: El 3 de abril irá al médico = The 3 of April > will-go to-the doctor.) > I was thinking about cases like this and wondering if I can make an animate-inanimate-human distinction for nouns or even just animate-inanimate so that I can eliminate a lot of options easily. POS tags and Gender info don't say much about this so still exploring this option, but it could be incorporated in the impeding parameters, which is a list that will be appended to. > Assuming that a pronoun exists right before the verb is highly language > specific. This works, as a rule, for SVO languages, like English, Spanish > and Catalan, but will not work for SOV languages, like (typically) Turkic > and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a. > Arabic and Celtic languages. As we have quite a lot of non-SVO languages in > Apertium, searching a subject right before the verb seems a bad guess. > > Furthermore, even for a SVO language like Spanish, there are several quite > often verbs for whom the subject in located after the verb, e.g.: > Me faltan libros > Me gustan los plátanos > Me duelen las muelas > etc. > Or in SVO languages like Russian or Esperanto, if is not rare to place the > subject after the verb, since the case tells us what is the subject. > > So, I think the system should deal with different language typologies, and > probably would need some configuration to deal with "special verbs" in a > specific language, like "faltar", "gustar", "doler" given in the Spanish > examples. Of course, you can try which are the results in the EU corpus > with the system you propose, but it don't think there will be a good > percentage of success in German, Finnish and Hungarian, and, I guess, they > will be worse in Slavic languages than in Romance and, of course, English. > Hmmm, I see your point. I guess for zero pronouns it becomes more of a Subject-Identification or a Semantic-Role-Labelling problem than an Anaphora Resolution problem. It's true that the assuming zero anaphora wouldn't work for non SVO languages, and it's funny I missed this considering Hindi is my mother tongue. I think it does make sense to make a system to deal with different language typologies, at least for zero pronouns. It isn't technically Anaphora Resolution so I'm a little confused about the scope, but the problem I've to deal with is choosing the correct pronoun so no point in restricting to traditional Anaphora Resolution. I explored the old and new methods to do Zero Anaphora Resolution and as expected, the older methods were heuristic based and the new methods are corpus based. "The same trend is observed also in Japanese zero- anaphora resolution, where the findings made in rule-based or theory-oriented work (Kameyama, 1986; Nakaiwa and Shirai, 1996; Okumura and Tamura, 1996, etc.) have been successfully incorporated in machine learning-based frame- works (Seki et al., 2002; Iida et al., 2003)." Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution <http://delivery.acm.org/10.1145/1230000/1220254/p625-iida.pdf?ip=14.139.82.6&id=1220254&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554037262_d6ed4c6c6a58decfdf858189b9a4202c> I explored the different methods that have been used, old and new. The newer ones are mostly ML based methods which require a large amount of data, however, even the older heuristic based methods require a large amount of Linguistic Information, such as syntactic trees, semantic knowledge, etc. Here are some examples: 1. An Empirical Study of Zero Anaphora Resolution in Chinese Based on Centering Model <https://aclanthology.info/pdf/O/O01/O01-1011.pdf> The Centering Model, a completely rule-based model and a very popular model to perform Zero Anaphora Resolution, takes a syntax structure and its semantic interpretation as input. "The task of zero and nominal anaphora resolution is performed after the semantic interpretation phase that converts the syntactic structure of a sentence into a semantic representation form such as the logic form" 2. Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution <http://delivery.acm.org/10.1145/1230000/1220254/p625-iida.pdf?ip=14.139.82.6&id=1220254&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554037262_d6ed4c6c6a58decfdf858189b9a4202c> Explores a hybrid mechanism but still uses ML to train models. 3. Zero Anaphora Resolution in Chinese with Shallow Parsing <http://ken.myqnapcloud.com/ycchen/paper/JCLC_2007_V17_N1_04.pdf> Is an interesting an inexpensive method, but uses Shallow Parsing. If that's something that we can develop, this could be an interesting avenue to explore. 4. Zero Pronoun Resolution with Attention-based Neural Network <https://aclweb.org/anthology/C18-1002> 5. Discriminative Approach to Predicate-Argument Structure Analysis with Zero-Anaphora Resolution <http://delivery.acm.org/10.1145/1670000/1667611/p85-imamura.pdf?ip=14.139.82.6&id=1667611&acc=OPEN&key=045416EF4DDA69D9%2E1E2B3508530718A8%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1554039082_a3b9391fe40e6a19a75e523ee7b4caa4> 6. Japanese Zero Pronoun Resolution based on Ranking Rules and Machine Learning <https://www.aclweb.org/anthology/W03-1024> Modern methods, all based on Machine Learning. --- So I guess what I'm trying to say is that it seems like zero pronoun resolution seems like a problem in itself and seems more similar to other problems like Semantic Role Labeling and it may not do the problem justice to solve this under Anaphora Resolution. Hence I'll explore this further and understand if it is possible. Maybe for now we could also focus on the method of assuming a pronoun and resolving it to see results, at least for SVO languages. I will add this information to the proposal, if I decide to include/not include Zero Anaphora Resolution in the project as I don't want to be too ambitious either. I will also include analysis in the paper as to why the method of Saliency Scores will be the best method given the current situation. Thanks a lot Hector for your comments as they helped me realise the importance of a problem I was putting under the umbrella of Anaphora Resolution. Others who have comments or suggestions, they'll be really appreciated and will help me build a good tool for Apertium and its users :) Thanks and Regards, Tanmai Khanna -- *Khanna, Tanmai*
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff