Re: [Apertium-stuff] Paradigms in Bidixes

Per Tunedal Sun, 11 Nov 2012 06:55:33 -0800

Hi,

On Sun, Nov 11, 2012, at 14:27, Francis Tyers wrote:
> El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure:
> > Hi,
> > OK. I just thought the other way around:
> > 
> > Because coverage is so low, it would be fruitful to generate
> > translations for unknown words.
> > 
> > In the next step, I intended to add the most frequent words, bit by bit.
> 
> Great!
> 
> > As you have pointed out, it's much more effective to have a word in the
> > dictionaries than to generate it by some rule. Thus the gain is
> > obviously largest from adding the most frequent compounds and
> > derivations explicitly in the dictionaries. But it's still nice to get
> > translations of the more rare compounds and derivations.
> 
> Bad investment in terms of time. You want your work to have maximum, not
> minimum impact. Thus, work by frequency. Add the frequent stuff first.


I agree. Obviously, I expressed my self poorly: I just meant that
compounding would be useful for the less frequent words, without
significantly deteriorating speed.

> 
> > See my comments below.
> > 
> > Yours,
> > Per Tunedal
> > 
> > On Sun, Nov 11, 2012, at 11:48, Francis Tyers wrote:
> > > El dg 11 de 11 de 2012 a les 10:46 +0100, en/na Per Tunedal va escriure:
> > > > Hi again Mikel,
> > > > do you have any examples of this. I need to see all the" XML clutter" to
> > > > understand how to use it practically.
> > > > 
> > > > This general translation of some word categories might be useful for
> > > > Swedish (sv) - Danish (da) and very useful for Norwegian (no) - Swedish
> > > > (sv). There are a lot of words that behave just as in your example.
> > > 
> > > Don't try and do derivational morphology in the bilingual dictionary.
> > 
> > Why? I just thought this might be interesting to try out.
> 
> Because it causes more problems than it solves. 

I see.

> 
> > > 
> > > > Further:
> > > > 
> > > > I am reflecting on the best way of treating prefixes, used to change the
> > > > meaning of a word. First I thought of attacking it as a compound, but
> > > > I'm not sure that's the best way. Maybe something like your example
> > > > would be better? Or even a third solution?
> > > 
> > > Don't do it. Work on stuff that is really going to effect the quality of
> > > the translation. 
> > 
> > Well, the most blatant errors are.
> > 1. Low word coverage. And I just wanted to try a solution that quickly
> > increases the coverage. Then there wouldn't be any panic for adding more
> > words, but it would increase the translation quality (and speed) one
> > step further. It would be a pleasure, not a plight to add new words. 
> 
> There is no solution that quickly improves the coverage, without quickly
> adding words. If you can't manage adding a few words, then I think that
> MT is not for you.

Well, I can always fall back on statistical MT, couldn't I? All the
same, I would like to try out Apertium. Rule based translation is
interesting: I learn more about languages at the same time as I learn
about Apertium.

> 
> > 2. Strange errors probably due to mistakes made by the tagger. And
> > you've told me that it isn't any use to train the tagger before adding
> > some 20 000 words. That would take me some 20 years. It's simply out of
> > the question.
> 
> If you think adding 20,000 words would take 20 years then you must be a
> very slow worker. For me, it would take about two months full time, or 6
> months part time. Perhaps a year, working for an hour/day Are you really
> saying that you are more than 20--40 times slower than me ? I mean, it's
> a fairly simple task, I find it hard to believe that there could be such
> a huge difference in productivity.
> 
> Try to measure your productivity over an hour -- or half an hour. And
> tell us how much it is, and how you've been working -- how you approach
> the task. It could be that you are just working really inefficiently and
> we can help you get up to normal speed.

Well. The largest problem is that I have a very limited knowledge of
Danish and not much resources available. My main goal is to translate
Norwegian: I have by now acquired some interesting books and done a
short course at the University in Norwegian.

The second problem is that I hate editing XML-files, as it's so easy to
make mistakes. And I have to learn a lot of codes/tags that I'm not
really interested in. But I will manage. I have printed the Apertium
manual and will read it. I hope it will help.

> 
> > Thus, I would have to use some other strategy. I will try different
> > strategies to add a large number of words at a time.
> 
> Add words by frequency. Use existing resources. Generate candidates
> using scripts, then postedit them.
> 
> > > 
> > > Work from frequency and add them word at a time. Do not try and work
> > > with derivational morphology while the coverage is so low. 
> > 
> > As I've said: Why not? What's the drawback?
> 
> The drawback is that it is unpredictable, and you end up with crappy
> dictionaries. Even the current compounding mechanism, between two
> languages like Dutch and Afrikaans is only around 90% accurate. And that
> is for noun-noun compounds, which are the most predictable. If you start
> to add derivation, you will decrease accuracy, probably to the point
> where it causes more problems than it solves.

OK. Thank you for explaining. As you've probably noted, I always ask
"why". I never take anything for granted. That way I learn a lot and
avoid doing stupid things: just because everyone always has done things
in  some way, it doesn't mean that it's the best way,  nor that it's the
way that suits me best.

I plan to do some more improvements to the pair Swedish-Danish (se-da)
and then start working with Norwegian - Swedish (no-sv).

> 
> Fran
> 
> 
Yours,
Per Tunedal

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Paradigms in Bidixes

Reply via email to