El 2020-06-17 21:36, Hèctor Alòs i Font escribió:
Missatge de Francis Tyers <fty...@prompsit.com> del dia dc., 17 de
juny 2020 a les 21:12:

El 2020-06-15 17:38, Hèctor Alòs i Font escribió:
Here come several practical examples. I tried to select them for
their
variety. The result is more a wish list than something structured.

These really are great! Thanks :) Sorry the reply has taken so long.

Let's begin with "je la baise". Depending on the context this may
be
"I kiss her" or "I fuck her". The context can tell us if we are in
a
formal or colloquial type of language. Another issue is that in
this
case the anaphora resolution can also help us: if the pronoun
reference is "hand", it can only be "kiss"; if it is a person, the
doubt persists.

For this, I would like to look at a concordance* of a large number
of
examples to see what kind of information can be used to
disambiguate.

Intuitively it seems like knowing the genre (e.g. formal/informal)
would
help. But probably also statistics about subjects, objects and
adjuncts,
and what they (co-)refer with.

* I tried to search on DuckDuckGo, but in the "internet" domain it
is very hard to find examples with "kiss", even with "moderated
search"
turned on.

In fact, perhaps that could be a genre "safe translation"... :D

Incidentally Google gives "I fuck her" as the translation. I'm able
to
get
"kiss" by adding "bouche" or "main".

I think if we want to go by frequency we should have "fuck" if we go
by safety we should have "kiss".

Probably "humblement" or "vous" are also good indicators of the
"kiss"
meaning.

Any better than that would require further investigation with a
concordance.

In terms of the module, if we want to do informal/formal then my
previous
suggestion would work fine.

Another kind of problem is the Arpitan words "chamô" ("camel";
plural
"camels") and "chamôs ("chamois"; unchanged in plural). So,
translating into French, I got yesterday chamois in a Bible text
of
Exodus xD  I solved it deciding in a CG rule that all "chamôs"
(without nothing around in singular) are camels.

As this is a different morphological paradigm, I would go with the
superscript
notation ¹²³...

(Similar cases in
French: fil/fils, foi/fois, cour/cours)

These have different lemmas, e.g.

^fils/fil<n><m><pl>/fils<n><m><sp>$     threads / son*
^fois/foi<n><f><pl>/fois<n><f><sp>$     faiths / time*
^cours/cour<n><f><pl>/cours<n><m><sp>$  courts / course*

The 'cour/cours' example can potentially be disambiguated by the
gender.

The others I suppose rules could be written, but I suspect they
would be
quite brittle. My guess is that the <sp> ones are more frequent. So
those
should be default, then the question is finding specific contexts
where
it should be the others. A concordance would help, but I'm not sure
how
they would be split by genre or semantic field. This is really a
problem
with how world-knowledge is encoded.

I wonder if something could be done with word embeddings here. For
example
my guess is that in the target language the two variants should not
be close in the vector space. And they should be closer to words in
the
same semantic field. This could then be something like a
reweighting of the translations according to target language
semantic
coherence.

Note that it would require information to be "backpropagated" from
the
target
language to the source language. Perhaps you could have something
like
per-reading embeddings that are trained using target language
information,

so e.g. (fils, fil<n><m><pl>) [0.323, 0.423, 0.11, 0.595]
(fils, fils<n><m><sp>) [0.53, 0.605, 0.54, 0.639]

Felipe did something like this in his thesis, but he only looked at
sequences of part of speech tags. Here we need to know information
about
the actual analyses.


Btw,
https://fruct.org/publications/fruct25/files/Mor.pdf

This paper seems to do something similar, e.g. they use target language
information to disambiguate source language tokens where there
is an ambiguity caused (in this case) by orthography.

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to