El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure:
> Hi,
> OK. I just thought the other way around:
> 
> Because coverage is so low, it would be fruitful to generate
> translations for unknown words.
> 
> In the next step, I intended to add the most frequent words, bit by bit.

Great!

> As you have pointed out, it's much more effective to have a word in the
> dictionaries than to generate it by some rule. Thus the gain is
> obviously largest from adding the most frequent compounds and
> derivations explicitly in the dictionaries. But it's still nice to get
> translations of the more rare compounds and derivations.

Bad investment in terms of time. You want your work to have maximum, not
minimum impact. Thus, work by frequency. Add the frequent stuff first.

> See my comments below.
> 
> Yours,
> Per Tunedal
> 
> On Sun, Nov 11, 2012, at 11:48, Francis Tyers wrote:
> > El dg 11 de 11 de 2012 a les 10:46 +0100, en/na Per Tunedal va escriure:
> > > Hi again Mikel,
> > > do you have any examples of this. I need to see all the" XML clutter" to
> > > understand how to use it practically.
> > > 
> > > This general translation of some word categories might be useful for
> > > Swedish (sv) - Danish (da) and very useful for Norwegian (no) - Swedish
> > > (sv). There are a lot of words that behave just as in your example.
> > 
> > Don't try and do derivational morphology in the bilingual dictionary.
> 
> Why? I just thought this might be interesting to try out.

Because it causes more problems than it solves. 

> > 
> > > Further:
> > > 
> > > I am reflecting on the best way of treating prefixes, used to change the
> > > meaning of a word. First I thought of attacking it as a compound, but
> > > I'm not sure that's the best way. Maybe something like your example
> > > would be better? Or even a third solution?
> > 
> > Don't do it. Work on stuff that is really going to effect the quality of
> > the translation. 
> 
> Well, the most blatant errors are.
> 1. Low word coverage. And I just wanted to try a solution that quickly
> increases the coverage. Then there wouldn't be any panic for adding more
> words, but it would increase the translation quality (and speed) one
> step further. It would be a pleasure, not a plight to add new words. 

There is no solution that quickly improves the coverage, without quickly
adding words. If you can't manage adding a few words, then I think that
MT is not for you.

> 2. Strange errors probably due to mistakes made by the tagger. And
> you've told me that it isn't any use to train the tagger before adding
> some 20 000 words. That would take me some 20 years. It's simply out of
> the question.

If you think adding 20,000 words would take 20 years then you must be a
very slow worker. For me, it would take about two months full time, or 6
months part time. Perhaps a year, working for an hour/day Are you really
saying that you are more than 20--40 times slower than me ? I mean, it's
a fairly simple task, I find it hard to believe that there could be such
a huge difference in productivity.

Try to measure your productivity over an hour -- or half an hour. And
tell us how much it is, and how you've been working -- how you approach
the task. It could be that you are just working really inefficiently and
we can help you get up to normal speed.

> Thus, I would have to use some other strategy. I will try different
> strategies to add a large number of words at a time.

Add words by frequency. Use existing resources. Generate candidates
using scripts, then postedit them.

> > 
> > Work from frequency and add them word at a time. Do not try and work
> > with derivational morphology while the coverage is so low. 
> 
> As I've said: Why not? What's the drawback?

The drawback is that it is unpredictable, and you end up with crappy
dictionaries. Even the current compounding mechanism, between two
languages like Dutch and Afrikaans is only around 90% accurate. And that
is for noun-noun compounds, which are the most predictable. If you start
to add derivation, you will decrease accuracy, probably to the point
where it causes more problems than it solves.

Fran


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to