Yes GRUs and LSTM are better than the traditional RNNs. I think we will use one of them.
On Mon, Apr 22, 2019 at 12:08 AM Sevilay Bayatlı <sevilaybaya...@gmail.com> wrote: > I agree with changing n-gram LM, but with which one RNN or GRU? As I see > from literature GRU has more advantages than RNN. > > Sevilay > > On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly < > aboelhamd.abotr...@gmail.com> wrote: > >> Hi Sevilay, >> >> I think a new language model that could distinguish the best ambiguous >> combination/s of a translation, would eliminate our need to max entropy >> model or any other method. >> But is that the case with RNNs LM, I don't know yet. >> But for now, do you agree that we need to change the LM first ? or you >> prefer going straight to an alternative method for max entropy ? and do you >> have any idea for such alternative method ? >> In my opinion, I think fixing all the bugs, evaluating our current >> system, then changing n-gram to RNNs, is the prior plan for the next two >> weeks or so. >> After that we can focus the research on what's next, if the accuracy is >> not good enough or there is a room for improvement. >> Do you agree with this ? >> >> Regards, >> Aboelhamd >> >> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı < >> sevilaybaya...@gmail.com> wrote: >> >>> Aboelhamd, >>> >>> I think using Gated Recurrent Units (GRUS) instead of n-gram language >>> model is a good idea, probably we can achieve more gain, however, the most >>> important part here is changing the maximum entropy. >>> >>> Lets see, what Fran thinks about it. >>> >>> Regards, >>> >>> Sevilay >>> >>> >>> >>> >>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly < >>> aboelhamd.abotr...@gmail.com> wrote: >>> >>>> Hi Sevilay. Hi Francis, >>>> >>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur >>>> and spa-eng pairs were very bad with 30% of the tested sentences were good, >>>> compared to apertium LRLM resolution. >>>> So we discussed what to do next and it is to utilize the breakthrough >>>> of deep learning neural networks in NLP and especially machine >>>> translations. >>>> Also we discussed about using different values of n more than 5 in the >>>> already used n-gram language model. And to evaluate the result of >>>> increasing value of n, which could give us some more insights in what to do >>>> next and how to do it. >>>> >>>> Since I have an intro to deep learning subject this term in college, I >>>> waited this past two weeks to be introduced to the application of deep >>>> learning in NLP and MTs. >>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs) >>>> and why to use it instead of the standard network in NLP, beside >>>> understanding the different architectures of it and the math done in the >>>> forward and back propagation. >>>> Also besides knowing how to build a simple language model, and avoiding >>>> the problem of (vanishing gradient) leading to not capturing long >>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term >>>> Memory (LSTM) network. >>>> >>>> For next step, we will consider working only on the language model and >>>> to let the max entropy part for later discussions. >>>> So along with trying different n values in the n-gram language model >>>> and evaluate the results, I will try either to use a ready RNNLM or to >>>> build a new one from scratch from what I learnt so far. Honestly I prefer >>>> the last choice because it will increase my experience in applying what I >>>> have learnt. >>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also >>>> implemented a character based language model as two assignments and they >>>> were very fun to do. So implementing a RNNs word based character LM will >>>> not take much time, though it may not be close to the state-of-the-art >>>> model and this is the disadvantage of it. >>>> >>>> Using NNLM instead of the n-gram LM has these possible advantages : >>>> - Automatically learn such syntactic and semantic features. >>>> - Overcome the curse of dimensionality by generating better >>>> generalizations. >>>> >>>> ---------------------------------------------- >>>> >>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't >>>> that different as Sevilay pointed out in our discussion. >>>> I knew that NNLM is better than statistical one, also that using >>>> machine learning instead of maximum entropy model will give better >>>> performance. >>>> *But* the evaluation results were very very disappointing, unexpected >>>> and illogical, so I thought there might be a bug in the code. >>>> And after some search, I found that I did a very very silly *mistake* >>>> in normalizing the LM scores. As the scores are log base 10 of the sentence >>>> probability, then the higher in magnitude has the lower probability, but I >>>> what I did was the inverse of that, and that was the cause of the very bad >>>> results. >>>> >>>> I am fixing this now and then will re-evaluate the results with Sevilay. >>>> >>>> Regards, >>>> Aboelhamd >>>> >>>> >>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly < >>>> aboelhamd.abotr...@gmail.com> wrote: >>>> >>>>> Thanks Sevilay for your feedback, and thanks for the resources. >>>>> >>>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı <sevilaybaya...@gmail.com >>>>> wrote: >>>>> >>>>>> hi Aboelhamd, >>>>>> >>>>>> Your proposal looks good, I found these resource may be will be >>>>>> benefit. >>>>>> >>>>>> >>>>>> >>>>>> <https://arxiv.org/pdf/1601.00710> >>>>>> Multi-source *neural translation* <https://arxiv.org/abs/1601.00710> >>>>>> https://arxiv.org/abs/1601.00710 >>>>>> >>>>>> >>>>>> <https://arxiv.org/pdf/1708.05943> >>>>>> *Neural machine translation *with extended context >>>>>> <https://arxiv.org/abs/1708.05943> >>>>>> https://arxiv.org/abs/1708.05943 >>>>>> >>>>>> Handling homographs in *neural machine translation* >>>>>> <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510 >>>>>> >>>>>> >>>>>> >>>>>> Sevilay >>>>>> >>>>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly < >>>>>> aboelhamd.abotr...@gmail.com> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I got a not solid yet idea as an alternative to yasmet and max >>>>>>> entropy models. >>>>>>> And it's by using neural networks to give us scores for the >>>>>>> ambiguous rules. >>>>>>> But I didn't yet set a formulation for the problem nor the structure >>>>>>> of the inputs, output and even the goal. >>>>>>> As I think there are many formulations that we can adopt. >>>>>>> >>>>>>> For example, the most straightforward structure, is to give the >>>>>>> network all the possible combinations >>>>>>> of a sentence translations and let it choose the best one, or give >>>>>>> them weights. >>>>>>> Hence, make the network learns which combinations to choose for a >>>>>>> specific pair. >>>>>>> >>>>>>> Another example, is instead of building one network per pair, >>>>>>> we build one network per ambiguous pattern as we did with max >>>>>>> entropy models. >>>>>>> So we give to the network the combinations for that pattern, >>>>>>> and let it assign some weights for the ambiguous rules applied to >>>>>>> that pattern. >>>>>>> >>>>>>> And for each structure there are many details and questions to yet >>>>>>> answer. >>>>>>> >>>>>>> So with that said, I decided to look at some papers to see what >>>>>>> others have done before >>>>>>> to tackle some similar problems or the exact problem, and how some >>>>>>> of them used machine learning >>>>>>> or deep learning to solve these problems, and then try build on them. >>>>>>> >>>>>>> Some papers resolution was very specific to the pairs they >>>>>>> developed, thus were not very important to our case. : >>>>>>> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean >>>>>>> Machine Translation >>>>>>> <https://www.worldscientific.com/doi/10.1142/S0219427903000887> >>>>>>> .(2003) >>>>>>> 2) Arabic Machine Translation: A Developmental Perspective >>>>>>> <http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf> >>>>>>> .(2010) >>>>>>> >>>>>>> Some other papers tried not to generate ambiguous rules or to >>>>>>> minimize the ambiguity in transfer rules inference, and didn't provide >>>>>>> any >>>>>>> methods to resolve the ambiguity in our case. I thought that they may >>>>>>> provide some help, but I think they are far from our topic : >>>>>>> 1) Learning Transfer Rules for Machine Translation with Limited Data >>>>>>> <http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf> >>>>>>> .(2005) >>>>>>> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small >>>>>>> Parallel Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009) >>>>>>> >>>>>>> Now I am looking into some more recent papers like : >>>>>>> 1) Rule Based Machine Translation Combined with Statistical Post >>>>>>> Editor for Japanese to English Patent Translation >>>>>>> <http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007) >>>>>>> 2) Machine translation model using inductive logic programming >>>>>>> <https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009) >>>>>>> 3) Machine Learning for Hybrid Machine Translation >>>>>>> <https://www.aclweb.org/anthology/W12-3138.pdf>.(2012) >>>>>>> 4) Study and Comparison of Rule-Based and Statistical >>>>>>> Catalan-Spanish Machine Translation Systems >>>>>>> <https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923> >>>>>>> .(2012) >>>>>>> 5) Latest trends in hybrid machine translation and its applications >>>>>>> <https://www.sciencedirect.com/science/article/pii/S0885230814001077> >>>>>>> .(2015) >>>>>>> 6) Machine Translation: Phrase-Based, Rule-Based and >>>>>>> NeuralApproaches with Linguistic Evaluation >>>>>>> <http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017) >>>>>>> 7) A Multitask-Based Neural Machine Translation Model with >>>>>>> Part-of-Speech Tags Integration for Arabic Dialects >>>>>>> <https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018) >>>>>>> >>>>>>> And I hope they give me some more insights and thoughts. >>>>>>> >>>>>>> -------------- >>>>>>> >>>>>>> - So do you have recommendations to other papers that refer to the >>>>>>> same problem ? >>>>>>> - Also about the proposal, I modified it a little bit and share it >>>>>>> through GSoC website as a draft, >>>>>>> so do you have any last feedback or thoughts about it, or do I just >>>>>>> submit it as a final proposal ? >>>>>>> - Last thing for the coding challenge ( integrating weighted >>>>>>> transfer rules with apertium-transfer ), >>>>>>> I think it's finished, and I didn't get any feedback or response >>>>>>> about it, also the pull-request is not merged yet with master. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Aboelhamd >>>>>>> >>>>>>> >>>>>>> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly < >>>>>>> aboelhamd.abotr...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Sevilay, hi spectei, >>>>>>>> >>>>>>>> For sentence splitting, I think that we don't need to know neither >>>>>>>> syntax nor sentence boundaries of the language. >>>>>>>> Also I don't see any necessity for applying it in runtime, as in >>>>>>>> runtime we only get the score of each pattern, >>>>>>>> where there is no need for splitting. I also had one thought on >>>>>>>> using beam-search here as I see it has no effect >>>>>>>> and may be I am wrong. We can discuss in it after we close this >>>>>>>> thread. >>>>>>>> >>>>>>>> We will handle the whole text as one unit and will depend only on >>>>>>>> the captured patterns. >>>>>>>> Knowing that in the chunker terms, successive patterns that don't >>>>>>>> share a transfer rule, are independent. >>>>>>>> So by using the lexical form of the text, we match the words with >>>>>>>> patterns, then match patterns with rules. >>>>>>>> And hence we know which patterns are ambiguous and how much >>>>>>>> ambiguous rules they match. >>>>>>>> >>>>>>>> For example if we have text with the following patterns and >>>>>>>> corresponding rules numbers: >>>>>>>> p1:2 p2:1 p3:6 p4:4 p5:3 p6:5 p7:1 p8:4 p9:4 p10:6 p11:8 >>>>>>>> p12:5 p13:5 p14:1 p15:3 p16:2 >>>>>>>> >>>>>>>> If such text was handled by our old method with generating all the >>>>>>>> combinations possible (multiplication of rules numbers), >>>>>>>> we would have 82944000 possible combinations, which are not >>>>>>>> practical at all to score, and take heavy computations and memory. >>>>>>>> And if it is handled by our new method with applying all ambiguous >>>>>>>> rules of one pattern while fixing the other patterns at LRLM rule >>>>>>>> (addition of rules numbers), we will have just 60 combinations, and >>>>>>>> not all of them different, giving drastically low number of >>>>>>>> combinations, >>>>>>>> which may be not so representative. >>>>>>>> >>>>>>>> But if we apply the splitting idea , we will have something in the >>>>>>>> middle, that will hopefully avoid the disadvantages of both methods >>>>>>>> and benefit from advantages of both, too. >>>>>>>> Let's proceed from the start of the text to the end of it, while >>>>>>>> maintaining some threshold of say 24000 combinations. >>>>>>>> p1 => 2 ,, p1 p2 => 2 ,, p1 p2 p3 => 12 ,, p1 p2 p3 p4 >>>>>>>> => 48 ,, p1 p2 p3 p4 p5 => 144 ,, >>>>>>>> p1 p2 p3 p4 p5 p6 => 720 ,, p1 p2 p3 p4 p5 p6 p7 => >>>>>>>> 720 >>>>>>>> p1 p2 p3 p4 p5 p6 p7 p8 => 2880 ,, p1 p2 p3 p4 p5 p6 >>>>>>>> p7 p8 p9 => 11520 >>>>>>>> >>>>>>>> And then we stop here, because taking the next pattern will exceed >>>>>>>> the threshold. >>>>>>>> Hence having our first split, we can now continue our work on it as >>>>>>>> usual. >>>>>>>> But with more -non overwhelming- combinations which would capture >>>>>>>> more semantics. >>>>>>>> After that, we take the next split and so on. >>>>>>>> >>>>>>>> ----------- >>>>>>>> >>>>>>>> I agree with you, that testing the current method with more than >>>>>>>> one pair to know its accuracy is the priority, >>>>>>>> and we currently working on it. >>>>>>>> >>>>>>>> ----------- >>>>>>>> >>>>>>>> For an alternative for yasmet, I agree with spectei. Unfortunately, >>>>>>>> for now I don't have a solid idea to discuss. >>>>>>>> But in the few days, i will try to get one or more ideas to discuss. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> El 2019-04-05 20:57, Sevilay Bayatlı escribió: >>>>>>>>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió: >>>>>>>>> >>> Hi Aboelhamd, >>>>>>>>> >>> >>>>>>>>> >>> There is some points in your proposal: >>>>>>>>> >>> >>>>>>>>> >>> First, I do not think "splitting sentence" is a good idea, each >>>>>>>>> >>> language has different syntax, how could you know when you >>>>>>>>> should >>>>>>>>> >>> split the sentence. >>>>>>>>> >> >>>>>>>>> >> Apertium works on the concept of a stream of words, so in the >>>>>>>>> >> runtime >>>>>>>>> >> we can't really rely on robust sentence segmentation. >>>>>>>>> >> >>>>>>>>> >> We can often use it, e.g. for training, but if sentence boundary >>>>>>>>> >> detection >>>>>>>>> >> were to be included, it would need to be trained, as Sevilay >>>>>>>>> hints >>>>>>>>> >> at. >>>>>>>>> >> >>>>>>>>> >> Also, I'm not sure how much we would gain from that. >>>>>>>>> >> >>>>>>>>> >>> Second, "substitute yasmet with other method", I think the >>>>>>>>> result >>>>>>>>> >> will >>>>>>>>> >>> not be more better if you substituted it with statistical >>>>>>>>> method. >>>>>>>>> >>> >>>>>>>>> >> >>>>>>>>> >> Substituting yasmet with a more up to date machine-learning >>>>>>>>> method >>>>>>>>> >> might be a worthwhile thing to do. What suggestions do you have? >>>>>>>>> >> >>>>>>>>> >> I think first we have to trying the exact method with more than >>>>>>>>> 3 >>>>>>>>> >> language pairs and then decide to substitute it or not, because >>>>>>>>> >> what is the point of new method if dont achieve gain, then we >>>>>>>>> can >>>>>>>>> >> compare the results of two methods and choose the best one. >>>>>>>>> What do >>>>>>>>> >> you think? >>>>>>>>> > >>>>>>>>> >>>>>>>>> Yes, testing it with more language pairs is also a priority. >>>>>>>>> >>>>>>>>> Fran >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Apertium-stuff mailing list >>>>>>>>> Apertium-stuff@lists.sourceforge.net >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Apertium-stuff mailing list >>>>>>> Apertium-stuff@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>>> >>>>>> _______________________________________________ >>>>>> Apertium-stuff mailing list >>>>>> Apertium-stuff@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>> >>>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff