On Sat, Jun 13, 2020, 11:50 Francis Tyers <fty...@prompsit.com> wrote:

> El 2020-06-13 15:20, Tino Didriksen escribió:
> > I would like everyone to read and seriously consider this thread and
> > give your opinion. This meanders a bit, so please read it all.
> >
>
> Here is a non-exhaustive list of potential pitfalls of using the
> "surface
> form is a tag" thing. As far as I understand the objective is to be able
> to
> put the original surface form in the output translation as an unknown
> token
> instead of the lemma.
>

Could you provide disambiguated analyses of each of these?  It's hard to
picture what the problem is for people who can't at least do the relevant
tokenisation in their heads.  (I'm not familiar with the example in (2)—I
imagine other people are similarly uninformed about the other examples.)

--
Jonathan

0) languages without spaces in the writing system:
>
>     what is a surface form here? is it just the longest token matched?
>
> 1) compounds
>
> i)  infrastruktuurontwikkelingsplan, does each part of the compound get
>      the surface form tag? if so, one happens if one part of the compound
>      is translated but the other parts aren't, e.g. would you get
>      *infrastruktuurontwikkelingsplan *infrastruktuurontwikkelingsplan
> plan?
>
> 2) contractions
>
> i)  chawe - if you attach the surface form to both and both are unknown,
> do
>      you get both in the output? if you only attach it to one, which one
> do you
>      attach it to, where is that decision made?
>
> ii) dárselo - if you attach the surface form to the clitic pronouns in
> addition
>     to the verb, what happens if the verb is not in the dictionary but
> the clitic
>     pronouns are? do you get the surface form and the translations in the
> output?
>
> I think that the appropriate way to deal with this is by coming up with
> a
> clear plan for the linguistic eventualities. I don't see that in the
> current
> proposal. I have been showing Tanmai through the creation of a new MT
> system,
> and we have been documenting these issues as they arise. I don't think
> it makes
> sense to start development before they have been resolved.
>
> Fran
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to