El dl 15 de 10 de 2012 a les 10:22 +0200, en/na Per Tunedal va escriure:
> Hi again,
> Well, if the internal consistency is so important in Apertium, it's very
> odd that there isn't already from the very beginning an automatic way of
> trimming the dics, like depicted in the Wiki :
> http://wiki.apertium.org/wiki/Automatically_trimming_a_monodix . When I
> first learned of this, I was surprised that this was not somehow
> integrated in the build process.

There are lots of things that aren't integrated that we'd like. Too many
mouths, not enough hands.

> As I found this counter-productive, I assumed that a solution would be
> to simply kill this darling.

Isn't counter productive for me, or many other Apertium developers. At
the moment I'm afraid it's a case of like it or lump it (or fix it).

> Anyhow, there are differences between monolingual dictionaries, other
> than the words included, that has to be somehow taken into account. For
> instance, the terminology might have to be standardized or "translated".
> This regards for instance genders of nouns and cases of pronouns.

This is what we use the bilingual dictionary and transfer rules for. 

> In my opinion, the terminology used in the language treated would be
> used, along with the equivalent in English. That would result in e.g.
> Akkusativ and Dativ for German and Objet direct et Objet indirect for
> French.

No. One of the huge benefits of Apertium over other disparate tools is
that the tagset is broadly homogenous. I would strongly argue against,
and vote against any move away from this. 

> I don't think it's any problem if the two (or more) monolingual
> dictionaries in a language pair use different terminology. In the bidix
> I simply use the conventions for the left language to the left and for
> the right language to the right. Or have I overlooked something?

You've overlooked that it is a pain in the arse. How about you try
building a language pair both ways, and see how frustrating it is when
the tagsets are different, and how easy it is when they are the same,
then you would know. I mean, I can tell you every day until the cows
come home, but until you experience it yourself you will have no idea.

> For example, the Norwegian monolingual dictionaries (nb/nn) use the
> cases nom=Nominativ, acc=Accusativ and gen=Genitiv, and the Swedish
> monolingual dictionary use the cases subj=Subjektsform, obj=Objektsform
> and gen=Genitiv. There isn't any need to change it, as far as I can see. 

This was an oversight carried over from the original Swedish/Danish
analysers. I would be very happy to change to nom/acc/gen in the
Swedish-Danish pair.

> Further the paradigms differ among versions. I still don't understand
> the implications of the different ways to treat personal pronouns in the
> monodix (and bidix!) between the pairs Icelandic (is) - Swedish (se) and
> Swedish (se) - Danish (da). The "icelandic" way looks more neat and
> elegant, though. Maybe some standardization would facilitate when
> creating new pairs.

Fixing the personal pronouns is about five minutes work. You can redo it
if you like, but I don't see any pressing need. 

I have an idea: Why don't you try both ways and see which one works
better for you ? 

Fran


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to