Thanks Sevilay for your feedback, and thanks for the resources. On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı <sevilaybaya...@gmail.com wrote:
> hi Aboelhamd, > > Your proposal looks good, I found these resource may be will be benefit. > > > > <https://arxiv.org/pdf/1601.00710> > Multi-source *neural translation* <https://arxiv.org/abs/1601.00710> > https://arxiv.org/abs/1601.00710 > > > <https://arxiv.org/pdf/1708.05943> > *Neural machine translation *with extended context > <https://arxiv.org/abs/1708.05943> > https://arxiv.org/abs/1708.05943 > > Handling homographs in *neural machine translation* > <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510 > > > > Sevilay > > On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <aboelhamd.abotr...@gmail.com> > wrote: > >> Hi all, >> >> I got a not solid yet idea as an alternative to yasmet and max entropy >> models. >> And it's by using neural networks to give us scores for the ambiguous >> rules. >> But I didn't yet set a formulation for the problem nor the structure of >> the inputs, output and even the goal. >> As I think there are many formulations that we can adopt. >> >> For example, the most straightforward structure, is to give the network >> all the possible combinations >> of a sentence translations and let it choose the best one, or give them >> weights. >> Hence, make the network learns which combinations to choose for a >> specific pair. >> >> Another example, is instead of building one network per pair, >> we build one network per ambiguous pattern as we did with max entropy >> models. >> So we give to the network the combinations for that pattern, >> and let it assign some weights for the ambiguous rules applied to that >> pattern. >> >> And for each structure there are many details and questions to yet answer. >> >> So with that said, I decided to look at some papers to see what others >> have done before >> to tackle some similar problems or the exact problem, and how some of >> them used machine learning >> or deep learning to solve these problems, and then try build on them. >> >> Some papers resolution was very specific to the pairs they developed, >> thus were not very important to our case. : >> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean Machine >> Translation >> <https://www.worldscientific.com/doi/10.1142/S0219427903000887>.(2003) >> 2) Arabic Machine Translation: A Developmental Perspective >> <http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf> >> .(2010) >> >> Some other papers tried not to generate ambiguous rules or to minimize >> the ambiguity in transfer rules inference, and didn't provide any methods >> to resolve the ambiguity in our case. I thought that they may provide some >> help, but I think they are far from our topic : >> 1) Learning Transfer Rules for Machine Translation with Limited Data >> <http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf>.(2005) >> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small >> Parallel Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009) >> >> Now I am looking into some more recent papers like : >> 1) Rule Based Machine Translation Combined with Statistical Post Editor >> for Japanese to English Patent Translation >> <http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007) >> 2) Machine translation model using inductive logic programming >> <https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009) >> 3) Machine Learning for Hybrid Machine Translation >> <https://www.aclweb.org/anthology/W12-3138.pdf>.(2012) >> 4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish >> Machine Translation Systems >> <https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923> >> .(2012) >> 5) Latest trends in hybrid machine translation and its applications >> <https://www.sciencedirect.com/science/article/pii/S0885230814001077> >> .(2015) >> 6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches >> with Linguistic Evaluation >> <http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017) >> 7) A Multitask-Based Neural Machine Translation Model with >> Part-of-Speech Tags Integration for Arabic Dialects >> <https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018) >> >> And I hope they give me some more insights and thoughts. >> >> -------------- >> >> - So do you have recommendations to other papers that refer to the same >> problem ? >> - Also about the proposal, I modified it a little bit and share it >> through GSoC website as a draft, >> so do you have any last feedback or thoughts about it, or do I just >> submit it as a final proposal ? >> - Last thing for the coding challenge ( integrating weighted transfer >> rules with apertium-transfer ), >> I think it's finished, and I didn't get any feedback or response about >> it, also the pull-request is not merged yet with master. >> >> >> Thanks, >> Aboelhamd >> >> >> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly < >> aboelhamd.abotr...@gmail.com> wrote: >> >>> Hi Sevilay, hi spectei, >>> >>> For sentence splitting, I think that we don't need to know neither >>> syntax nor sentence boundaries of the language. >>> Also I don't see any necessity for applying it in runtime, as in runtime >>> we only get the score of each pattern, >>> where there is no need for splitting. I also had one thought on using >>> beam-search here as I see it has no effect >>> and may be I am wrong. We can discuss in it after we close this thread. >>> >>> We will handle the whole text as one unit and will depend only on the >>> captured patterns. >>> Knowing that in the chunker terms, successive patterns that don't share >>> a transfer rule, are independent. >>> So by using the lexical form of the text, we match the words with >>> patterns, then match patterns with rules. >>> And hence we know which patterns are ambiguous and how much ambiguous >>> rules they match. >>> >>> For example if we have text with the following patterns and >>> corresponding rules numbers: >>> p1:2 p2:1 p3:6 p4:4 p5:3 p6:5 p7:1 p8:4 p9:4 p10:6 p11:8 >>> p12:5 p13:5 p14:1 p15:3 p16:2 >>> >>> If such text was handled by our old method with generating all the >>> combinations possible (multiplication of rules numbers), >>> we would have 82944000 possible combinations, which are not practical at >>> all to score, and take heavy computations and memory. >>> And if it is handled by our new method with applying all ambiguous rules >>> of one pattern while fixing the other patterns at LRLM rule >>> (addition of rules numbers), we will have just 60 combinations, and not >>> all of them different, giving drastically low number of combinations, >>> which may be not so representative. >>> >>> But if we apply the splitting idea , we will have something in the >>> middle, that will hopefully avoid the disadvantages of both methods >>> and benefit from advantages of both, too. >>> Let's proceed from the start of the text to the end of it, while >>> maintaining some threshold of say 24000 combinations. >>> p1 => 2 ,, p1 p2 => 2 ,, p1 p2 p3 => 12 ,, p1 p2 p3 p4 => >>> 48 ,, p1 p2 p3 p4 p5 => 144 ,, >>> p1 p2 p3 p4 p5 p6 => 720 ,, p1 p2 p3 p4 p5 p6 p7 => 720 >>> p1 p2 p3 p4 p5 p6 p7 p8 => 2880 ,, p1 p2 p3 p4 p5 p6 p7 >>> p8 p9 => 11520 >>> >>> And then we stop here, because taking the next pattern will exceed the >>> threshold. >>> Hence having our first split, we can now continue our work on it as >>> usual. >>> But with more -non overwhelming- combinations which would capture more >>> semantics. >>> After that, we take the next split and so on. >>> >>> ----------- >>> >>> I agree with you, that testing the current method with more than one >>> pair to know its accuracy is the priority, >>> and we currently working on it. >>> >>> ----------- >>> >>> For an alternative for yasmet, I agree with spectei. Unfortunately, for >>> now I don't have a solid idea to discuss. >>> But in the few days, i will try to get one or more ideas to discuss. >>> >>> >>> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com> >>> wrote: >>> >>>> El 2019-04-05 20:57, Sevilay Bayatlı escribió: >>>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com> wrote: >>>> > >>>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió: >>>> >>> Hi Aboelhamd, >>>> >>> >>>> >>> There is some points in your proposal: >>>> >>> >>>> >>> First, I do not think "splitting sentence" is a good idea, each >>>> >>> language has different syntax, how could you know when you should >>>> >>> split the sentence. >>>> >> >>>> >> Apertium works on the concept of a stream of words, so in the >>>> >> runtime >>>> >> we can't really rely on robust sentence segmentation. >>>> >> >>>> >> We can often use it, e.g. for training, but if sentence boundary >>>> >> detection >>>> >> were to be included, it would need to be trained, as Sevilay hints >>>> >> at. >>>> >> >>>> >> Also, I'm not sure how much we would gain from that. >>>> >> >>>> >>> Second, "substitute yasmet with other method", I think the result >>>> >> will >>>> >>> not be more better if you substituted it with statistical method. >>>> >>> >>>> >> >>>> >> Substituting yasmet with a more up to date machine-learning method >>>> >> might be a worthwhile thing to do. What suggestions do you have? >>>> >> >>>> >> I think first we have to trying the exact method with more than 3 >>>> >> language pairs and then decide to substitute it or not, because >>>> >> what is the point of new method if dont achieve gain, then we can >>>> >> compare the results of two methods and choose the best one. What do >>>> >> you think? >>>> > >>>> >>>> Yes, testing it with more language pairs is also a priority. >>>> >>>> Fran >>>> >>>> >>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff