Hi all, The Anaphora Resolution module is currently being tested on the Spanish-English pair, and it would be helpful to test it on several pairs. The module basically detects certain patterns and gives the potential antecedents scores based on these patterns, for example:
1. A noun that is part of a Prepositional Phrase is given a -1 score as it is less likely to be the antecedent of a pronoun. So if the phrase is "groups of parliament", then *groups* will be more likely to be the antecedent than *parliament*. 2. First NP in a sentence is given a +1 score as statistically it's seen that the first NP is more likely to be the antecedent than the other nouns in a sentence. And several other antecedent indicators.. All of these conclusions have been based on thorough research, and can be found in this paper <https://link.springer.com/content/pdf/10.1023%2FA%3A1011184828072.pdf>. Since NPs, PPs, etc. have different patterns for different languages, these patterns are defined in an external XML file which will be different for each language pair. *You can find an example of this in the file attached: apertium-eng-spa.spa-eng.arx (Spanish-English).* I need help from speakers of these languages (or any others) to make this xml file for their language so that I can test the system on that pair: Catalan Galician Portuguese If you are working on a language pair and think that Anaphora Resolution can improve the translation then please reply to this and we can work on adapting the tool for the language pair :) Thanks and Regards, Tanmai Khanna -- *Khanna, Tanmai*
apertium-eng-spa.spa-eng.arx
Description: Binary data
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff