El 2020-06-29 14:18, Hèctor Alòs i Font escribió:
Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 28 de
juny 2020 a les 15:10:

El 2020-06-28 12:10, Hèctor Alòs i Font escribió:
Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 28 de
juny 2020 a les 13:38:

El 2020-06-28 11:11, Hèctor Alòs i Font escribió:
Hi Tanmai,

I had been trying to add quotes in other pairs, and my
experience
was
that results were worst. If you add them, the morphological
disambiguation rules and transfer rules will be broken, for
example
in: Hizo un "duro" trabajo.

This is an interesting case, of course they are useful for
indicating
reported speech, but we don't want them to break transfer like
that,
and in general they should be a token. This is a very good point
and something that should be thought about in Apertium.

In any case, if you add them, I think this does not affect the
other
translators, at least as long as the morphological analyser is
not
retrained.

You mean the part of speech tagger?

Yes.

As for surnames that are ambiguous, these must be added. In
principle,
there are already disambiguation rules in apertium-spa to deal
with
these cases. Surely, you will have to copy them in apertium-scn,
if
you have not already done so.

Ok, great.

Finally, what codification of the Sicilian have you chosen? I
tried it
out a few months ago and the Sicilian version of the Unesco
Courier
(Lu Currieri di l'UNESCO) was very poorly recognized.


We're only doing one direction, from Sicilian->Spanish, and we're
using
Wikipedia, so the "model de llenguatge" is a bit of a mess. We
recently
found a dictionary of ~19k words, so that could prove to be a
base
in the future.

We have a coverage of ~90% on the Wikipedia (although it's very
proper
name
heavy) but it also depends very much on dialect and orthography.
So
here
are
ten random sentences with translations:

L' annu luci nun eni na unitati di misura du tempu e mancu da
quantitati
di luci..
El año luz no es una unidad de medida dos tiempo y tampoco de la
cantidad de luz..

Muriu a Roma u 1 Jnnaru 1713. Fu biatificatu du Papa Piu VIII u
29
settemmiri 1803.
Murió a Roma el 1 *Jnnaru 1713. Fue *biatificatu dos Papá Más
VIII el 29
*settemmiri 1803.

Munti San Savinu è nu cumuni dâ pruvincia di Arezzu. Havi na
pupulazzioni di 8'128 abbitanti.
Monte Santo *Savinu es una comuna de la provincia ~de
#Arezzo<np><loc>.
Tiene una población de 8.128 habitantes.

Nuvara faci parti dû Piemonti ma si parra lu lummardu nzèmmula
a
lu
talianu.
#Novara hace parte del *Piemonti pero se habla el lombardo juntos
al

italiano.

La sìmula è nu prudottu ntermediu dâ macinazzioni di lu granu
duru ca,
rimacinatu, veni trasfurmatu 'n  farina..
La harina es un producto *ntermediu de la *macinazzioni del trigo
duro
que, *rimacinatu, se transforma en harina..

Lu tango è nu tipu di cumpusizzioni musicali in 2/4, è nu
abballu
pupulari. Nasci a Buenos Aires (Argentina)..
El tango es un tipo de composición musical en 2/4, es un baile
popular.
Nace a Buenos Aires (Argentina)..

O paisi di Adranu ci su tanti giurnali e tanti emittenti. I
cchiù
famusi
su': TVA, RSI, Symmachia e "La Locomotiva"..
O país de Adrano sobre tantos revistas y tantas *emittenti. Las
más
famosas sobre': *TVA, *RSI, *Symmachia y "La *Locomotiva"..

Tra li cità cchiù mpurtanti dû massicciu cintrali ci sunnu
Limoges e
Clermont-Ferrant..
Entre las ciudades más importantes del macizo central #hay
Limoges
y
Clermont-*Ferrant..

Lu catu è nu ricipienti speci di forma circulari usatu pi
lavàrisi
o pi
lavari panni, piatti, virduri, ecc..
El cubo es un recipiente suerte de forma *circulari usada por
lavarse o
por lavar paños, platos, verduras, etc..

A cchiù cumpleta raccolta ri materiali è da BBC, fu a surgenti
principali pu muntaggiu du DVD. Duranti a produzioni du DVD
ufficiali,
MTV pristò a la Woodcharm Ltd. i soi B-roll e materiali ri
contru-inquatraturi; chista è stata na surgenti aggiuntiva pu
matiriali
USA ca appàri nto DVD ufficiali.
La más completa *raccolta de material es de la *BBC, fue la
manantial
principal por el *muntaggiu dos DVD. Durante la producción dos
DVD
oficial, *MTV prestó a la *Woodcharm *Ltd. las suyas *B-*roll y
materiales de contra-*inquatraturi; esta es estada una manantial
*aggiuntiva por el material USA que aparece en el DVD oficial.

Ernest Ropiequit "Jack" Hilgard (1904 - 2001) a statu nu
pissicoluggu
statunitenzi, prufissuri a l' univirsitati di Stanford, ca
addivintau
famosu 'nta l' anni '50 ppi li so ricerchi supra a l' ipnosi,
spiciarmenti supra lu cuntrulli di lu duluri.
Ernest *Ropiequit "Jack" *Hilgard (1904 - 2001) ha estada un
*pissicoluggu estadounidense, profesor a la universidad de
Stanford,
que
se hizo famoso en los años '50 por los sus rebuscas sobre los
*ipnosi,
especialmente sobre el controlas del dolor.

Where is the text you tried out, we could potentially use it in
the
evaluation.

Fran

A few months ago I evaluated whether working on Sicilian instead
of
Arpitan. I preferred not to go with Sicilian because I understood
that
apertium-scn does not follow the Caddemia standardisation
https://cademiasiciliana.org/

That's true, the core of it was developed in 2016 before the
Caddemia
was founded.

I think working with the standard is an excellent thing though. Do
you
know if they have a reference dictionary? It should be possible
to map from whatever we have to whatever they have, and we
already cover quite a lot of the variation.

No, I don't know if they have.
Yes, the variation is a great problem, but which variety are you going
to generate? Or you won't generate Sicilian?


In the first version of the system we're not planning on generating Sicilian,
just Spanish.

The idea is to make a prototype and then look for someone who knows
Sicilian and has an orthography they like, who'd like to
either do the opposite direction or do e.g. Italian→Sicilian.

Fran



_______________________________________________
Apertium-catala mailing list
Apertium-catala@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-catala

Reply via email to