Hi,
I'm pleased to hear of the plans of  Jonas Fromseier Mortensen to start
working on Norwegian-Danish (no-da), including both bokmål (nb) and
nynorsk (nn). That would make it much easier for me to realize my
original plan to set up the pair Norwegian-Swedish (no-sv), me too
including both bokmål (nb) and nynorsk (nn).

What has refrained me from starting the work so far, is that I was
pushed into first fixing "some minor issues" with the pair
Swedish-Danish (sv-da). OK, I'll give it a week, I thought, and have now
spent a year! My goal was to fix the most blatant errors and extend the
dictionaries to include more words used in ordinary life, rather than in
the EU Parliament. Further I wanted to release the other translation
direction, Danish to Swedish (da-sv).

Status as today:

1. I've fixed some errors but many are yet to be found and tackled. Some
errors might be fixed by retraining the tagger, writing some clever
transfer rules and using the new disambiguator: that remains for me to
try.
2. I've added quite a few new words, mainly by:
a) adding entries from the pair Icelandic-Swedish (is-sv)
b) gold-washing from various sources by using wish-list of Danish and
Swedish words.
- I hoped that many of the words would "meet in the middle", i.e. would
be present in both monodixies, letting me just add the translation in
the bidix. Unfortunately, this only happened for about a third of the
added words. Consequently, I have to add some words manually to the
monodixies.
- By now, I've added most of the found wanted nouns and verbs. I have
simply skipped all words I haven't managed to translate effortlessly.
- Many common adjectives and adverbs remains to add.

Further, I've added quite a few abbreviations and some common false
friends I know of. I've also started some work on pronouns - many are
still missing.

Working with the bidix has revealed that many of the words in the Danish
dictionary (much larger than the Swedish dictionary) are simply
non-existent. All the same, they are nicely put into the monodix with
valid paradigms. Apparently, one or more of the semi-automatic tools has
gone havoc. This is a minor problem for me, as they will all go away
when I trim the dictionaries, but might be a nuisance for Jonas while
working on the new pair  Norwegian-Danish (no-da).

An other problem is that my knowledge of Danish is very limited. I have
tried to make some informed guesses, with the help of dictionaries and
an introductory grammar. All the same, some of my entries, especially in
the Danish monodix, might be erroneous. It might be a good idea to take
a glance at them (marked by my initials PT). Maybe expanding the monodix
and looking for odd entries. Or translating some test texts and spotting
errors.

The translation is still very poor, and unfortunately I believe that
this is very hard to fix. I've identified the tagger and word
disambiguation as the critical steps. I've come to the conclusion that
it's silly to let the tagger choose one and only one translation. A
better disambiguation would be most helpful. Maybe it would be possible
to translate all possible matches, disregarding the part of speech, and
later choose the translation that makes most sense/is the most fluent in
the target language? Or use a disambiguator instead of the tagger? I
will gladly discuss this in a separate thread.

Right now, I'm quite busy with other projects, so I cannot do much work
on Apertium. On the other hand I'm always interested in having a
discussion.
Yours,
Per Tunedal

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to