One more thing you could try is doing a "semi-oracle" system: Make the translations, and choose the one that is closest to the reference translation. What is the best score can you get?
Thanks for your comments, they are useful, with the comment above, how can we choose the translation closest into reference? could you give an example? Sevilay On Tue, Apr 23, 2019 at 4:44 PM Francis Tyers <fty...@prompsit.com> wrote: > El 2019-04-23 10:27, Sevilay Bayatlı escribió: > > Hi everyone, > > > > We want to improve apertium-ambiguous for getting more better result, > > there are more than options for that, either by improve it > > linguistically or using new learning method. > > > > The first solutions is possible in such cases: > > > > 1- pretty time (for adding more vocabulary and write transfer rules ), > > as I understand all Oguz Turkic group and some of languages in Kipchak > > group, I can choose one system and improve it, but based on my > > experiments, this can improve the system in case there is much more > > of ambiguous rules and 0 out of vocabulary, also if I have have > > time. > > > > 2- using new learning method, this can be in step replace it with > > maximum entropy, we talked with Aboelhamd for using scikit-learn, but > > didn't decide a good formulation for our problem, yet. > > > > Dear apertiumer, we want to hear your suggestions for choosing new > > method instead of maximum entropy. > > > > My thoughts would be to start by characterising what the problem is > with maximum entropy, and then start to look at other methods. > > For example, determine what role amount of data plays. Try with 10%, > 25%, 50%, 75%, 100% and look at the learning curve, if it doesn't > seem to be plateauing then perhaps try adding more data. > Another thing would be look at the number of ambiguous rules, try > with 1, 2, 5, 10, ... and see what the learning curve is. How much > difference does each rule ambiguity add? > > In addition, you could think of adding more features, for example, > tags as well as lemmas. > > One more thing you could try is doing a "semi-oracle" system: > > Make the translations, and choose the one that is closest to the > reference translation. What is the best score can you get? > > After doing this I think it would be worthwhile looking at other > methods. SVM is one option, as are CRF and RNNs, but remember > that for RNN a lot of data is needed, so I'm not sure how much > sense it makes looking at that unless you are able to process > a lot more data more efficiently. > > Best regards, > > Francis M. Tyers > > >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff