El dc 12 de 09 de 2012 a les 15:35 +0200, en/na Per Tunedal va escriure:
> Hi again,
> unfortunately there are 50 000 lines with errors :-(

I've seen worse ;)

> Apparently, it's infeasible to correct them all by hand. 

No it isn't. :)

> Most errors are
> of the @ type, word not in the bidix (and properly not in the Swedish
> monodix either).

$ cat /tmp/da-sv.testvoc | grep '@' | wc -l
43662

$ cat /tmp/da-sv.testvoc | grep '@' | grep '<n>' | wc -l
23618

Divide the number of nouns by 8 (each noun has ~8 forms -- sg/pl
def/indef nom/gen) and you get 2952.25 ... it's possible to translate
between 300-500 words/day, making it about a week's work to add the
translations.

$ cat /tmp/da-sv.testvoc | grep '@' | grep '<vblex' | wc -l
19774

There are 9 forms per verb, making it around 2197 translations, another
week or so. 

Then the remaining ~270 errors shouldn't take more than a couple of days
to fix.

So, it's about half a month's work in total.

> I'm thinking about some automatic solution.
> 1) adding the Swedish words from SALDO (and words from is-sv).

This is a bad idea -- you'll mess up the testvoc in the other direction.
You should first work out the translations, then only add the ones you
have translations for.

> 2) building a bidix from bilingual corpora

This would be ok, but in my experience, setting up Moses/GIZA/etc. for
the first time can take up to a week. And in the end you have to
postedit (check every word) in the lists anyway.

> What's the most appropriate way to continue?

I would just do it manually, it's really not that much work when you
break it down into manageable chunks.

Fran


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to