Hi, On Sun, Nov 11, 2012, at 14:27, Francis Tyers wrote: > El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure: > > Hi, > > OK. I just thought the other way around: > > > > Because coverage is so low, it would be fruitful to generate > > translations for unknown words. > > > > In the next step, I intended to add the most frequent words, bit by bit. > > Great! > > > As you have pointed out, it's much more effective to have a word in the > > dictionaries than to generate it by some rule. Thus the gain is > > obviously largest from adding the most frequent compounds and > > derivations explicitly in the dictionaries. But it's still nice to get > > translations of the more rare compounds and derivations. > > Bad investment in terms of time. You want your work to have maximum, not > minimum impact. Thus, work by frequency. Add the frequent stuff first.
I agree. Obviously, I expressed my self poorly: I just meant that compounding would be useful for the less frequent words, without significantly deteriorating speed. > > > See my comments below. > > > > Yours, > > Per Tunedal > > > > On Sun, Nov 11, 2012, at 11:48, Francis Tyers wrote: > > > El dg 11 de 11 de 2012 a les 10:46 +0100, en/na Per Tunedal va escriure: > > > > Hi again Mikel, > > > > do you have any examples of this. I need to see all the" XML clutter" to > > > > understand how to use it practically. > > > > > > > > This general translation of some word categories might be useful for > > > > Swedish (sv) - Danish (da) and very useful for Norwegian (no) - Swedish > > > > (sv). There are a lot of words that behave just as in your example. > > > > > > Don't try and do derivational morphology in the bilingual dictionary. > > > > Why? I just thought this might be interesting to try out. > > Because it causes more problems than it solves. I see. > > > > > > > > Further: > > > > > > > > I am reflecting on the best way of treating prefixes, used to change the > > > > meaning of a word. First I thought of attacking it as a compound, but > > > > I'm not sure that's the best way. Maybe something like your example > > > > would be better? Or even a third solution? > > > > > > Don't do it. Work on stuff that is really going to effect the quality of > > > the translation. > > > > Well, the most blatant errors are. > > 1. Low word coverage. And I just wanted to try a solution that quickly > > increases the coverage. Then there wouldn't be any panic for adding more > > words, but it would increase the translation quality (and speed) one > > step further. It would be a pleasure, not a plight to add new words. > > There is no solution that quickly improves the coverage, without quickly > adding words. If you can't manage adding a few words, then I think that > MT is not for you. Well, I can always fall back on statistical MT, couldn't I? All the same, I would like to try out Apertium. Rule based translation is interesting: I learn more about languages at the same time as I learn about Apertium. > > > 2. Strange errors probably due to mistakes made by the tagger. And > > you've told me that it isn't any use to train the tagger before adding > > some 20 000 words. That would take me some 20 years. It's simply out of > > the question. > > If you think adding 20,000 words would take 20 years then you must be a > very slow worker. For me, it would take about two months full time, or 6 > months part time. Perhaps a year, working for an hour/day Are you really > saying that you are more than 20--40 times slower than me ? I mean, it's > a fairly simple task, I find it hard to believe that there could be such > a huge difference in productivity. > > Try to measure your productivity over an hour -- or half an hour. And > tell us how much it is, and how you've been working -- how you approach > the task. It could be that you are just working really inefficiently and > we can help you get up to normal speed. Well. The largest problem is that I have a very limited knowledge of Danish and not much resources available. My main goal is to translate Norwegian: I have by now acquired some interesting books and done a short course at the University in Norwegian. The second problem is that I hate editing XML-files, as it's so easy to make mistakes. And I have to learn a lot of codes/tags that I'm not really interested in. But I will manage. I have printed the Apertium manual and will read it. I hope it will help. > > > Thus, I would have to use some other strategy. I will try different > > strategies to add a large number of words at a time. > > Add words by frequency. Use existing resources. Generate candidates > using scripts, then postedit them. > > > > > > > Work from frequency and add them word at a time. Do not try and work > > > with derivational morphology while the coverage is so low. > > > > As I've said: Why not? What's the drawback? > > The drawback is that it is unpredictable, and you end up with crappy > dictionaries. Even the current compounding mechanism, between two > languages like Dutch and Afrikaans is only around 90% accurate. And that > is for noun-noun compounds, which are the most predictable. If you start > to add derivation, you will decrease accuracy, probably to the point > where it causes more problems than it solves. OK. Thank you for explaining. As you've probably noted, I always ask "why". I never take anything for granted. That way I learn a lot and avoid doing stupid things: just because everyone always has done things in some way, it doesn't mean that it's the best way, nor that it's the way that suits me best. I plan to do some more improvements to the pair Swedish-Danish (se-da) and then start working with Norwegian - Swedish (no-sv). > > Fran > > Yours, Per Tunedal ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
