El dg 11 de 11 de 2012 a les 15:52 +0100, en/na Per Tunedal va escriure: > Hi, > > On Sun, Nov 11, 2012, at 14:27, Francis Tyers wrote: > > El dg 11 de 11 de 2012 a les 14:11 +0100, en/na Per Tunedal va escriure: > > > Hi,
> > > > > > > > > > > > Further: > > > > > > > > > > I am reflecting on the best way of treating prefixes, used to change > > > > > the > > > > > meaning of a word. First I thought of attacking it as a compound, but > > > > > I'm not sure that's the best way. Maybe something like your example > > > > > would be better? Or even a third solution? > > > > > > > > Don't do it. Work on stuff that is really going to effect the quality of > > > > the translation. > > > > > > Well, the most blatant errors are. > > > 1. Low word coverage. And I just wanted to try a solution that quickly > > > increases the coverage. Then there wouldn't be any panic for adding more > > > words, but it would increase the translation quality (and speed) one > > > step further. It would be a pleasure, not a plight to add new words. > > > > There is no solution that quickly improves the coverage, without quickly > > adding words. If you can't manage adding a few words, then I think that > > MT is not for you. > > Well, I can always fall back on statistical MT, couldn't I? All the > same, I would like to try out Apertium. Rule based translation is > interesting: I learn more about languages at the same time as I learn > about Apertium. Yes, that's what people who don't like learning about languages, and working with dictionaries do. They use SMT. Me, I prefer learning about languages :) It's fun!! > > > > > 2. Strange errors probably due to mistakes made by the tagger. And > > > you've told me that it isn't any use to train the tagger before adding > > > some 20 000 words. That would take me some 20 years. It's simply out of > > > the question. > > > > If you think adding 20,000 words would take 20 years then you must be a > > very slow worker. For me, it would take about two months full time, or 6 > > months part time. Perhaps a year, working for an hour/day Are you really > > saying that you are more than 20--40 times slower than me ? I mean, it's > > a fairly simple task, I find it hard to believe that there could be such > > a huge difference in productivity. > > > > Try to measure your productivity over an hour -- or half an hour. And > > tell us how much it is, and how you've been working -- how you approach > > the task. It could be that you are just working really inefficiently and > > we can help you get up to normal speed. > > Well. The largest problem is that I have a very limited knowledge of > Danish and not much resources available. Translate from Danish to Swedish. Use the Europarl parallel corpus. You can quite easily take a frequency list of missing words, and build a concordance for each word using the corpus. > My main goal is to translate Norwegian: I have by now acquired some > interesting books and done a > short course at the University in Norwegian. Great. > The second problem is that I hate editing XML-files, as it's so easy to > make mistakes. And I have to learn a lot of codes/tags that I'm not > really interested in. But I will manage. I have printed the Apertium > manual and will read it. I hope it will help. Very few people actually write the XML from scratch. Normally what I do is make some kind of spreadsheety type list, a, b, adj, adj.sint c, d, n.f, n.m e, f, vblex, vblex and then use a simple bash or python script to convert it to XML: for w in `cat list | sed 's/ /_/g'`; do row=`echo $w | sed 's/_//g'`; sl=`echo $row | cut -f1 -d','`; tl=`echo $row | cut -f2 -d','`; st=`echo $row | cut -f3 -d',' | sed 's/\./"\/><s n="/g'`; tt=`echo $row | cut -f4 -d',' | sed 's/\./"\/><s n="/g'`; echo '<e><p><l>'$sl'<s n="'$st'"/></l><r>'$tl'<s n="'$tt'"/></r></p></e>'; done > > > > > > > > Work from frequency and add them word at a time. Do not try and work > > > > with derivational morphology while the coverage is so low. > > > > > > As I've said: Why not? What's the drawback? > > > > The drawback is that it is unpredictable, and you end up with crappy > > dictionaries. Even the current compounding mechanism, between two > > languages like Dutch and Afrikaans is only around 90% accurate. And that > > is for noun-noun compounds, which are the most predictable. If you start > > to add derivation, you will decrease accuracy, probably to the point > > where it causes more problems than it solves. > > OK. Thank you for explaining. As you've probably noted, I always ask > "why". I never take anything for granted. That way I learn a lot and > avoid doing stupid things: just because everyone always has done things > in some way, it doesn't mean that it's the best way, nor that it's the > way that suits me best. > > I plan to do some more improvements to the pair Swedish-Danish (se-da) sv! :) > and then start working with Norwegian - Swedish (no-sv). Great! Fran ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
