Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Aboelhamd Aly Sun, 21 Apr 2019 15:41:56 -0700

Yes GRUs and LSTM are better than the traditional RNNs. I think we will use
one of them.


On Mon, Apr 22, 2019 at 12:08 AM Sevilay Bayatlı <sevilaybaya...@gmail.com>
wrote:

> I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
> from literature GRU has more advantages than RNN.
>
> Sevilay
>
> On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay,
>>
>> I think a new language model that could distinguish the best ambiguous
>> combination/s of a translation, would eliminate our need to max entropy
>> model or any other method.
>> But is that the case with RNNs LM, I don't know yet.
>> But for now, do you agree that we need to change the LM first ? or you
>> prefer going straight to an alternative method for max entropy ? and do you
>> have any idea for such alternative method ?
>> In my opinion, I think fixing all the bugs, evaluating our current
>> system, then changing n-gram to RNNs, is the prior plan for the next two
>> weeks or so.
>> After that we can focus the research on what's next, if the accuracy is
>> not good enough or there is a room for improvement.
>> Do you agree with this ?
>>
>> Regards,
>> Aboelhamd
>>
>> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı <
>> sevilaybaya...@gmail.com> wrote:
>>
>>> Aboelhamd,
>>>
>>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>>> model is a good idea, probably we can achieve more gain,  however, the most
>>> important part here is changing the maximum entropy.
>>>
>>> Lets see, what Fran thinks about it.
>>>
>>> Regards,
>>>
>>> Sevilay
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
>>>> Hi Sevilay. Hi Francis,
>>>>
>>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>>> compared to apertium LRLM resolution.
>>>> So we discussed what to do next and it is to utilize the breakthrough
>>>> of deep learning neural networks in NLP and especially machine 
>>>> translations.
>>>> Also we discussed about using different values of n more than 5 in the
>>>> already used n-gram language model. And to evaluate the result of
>>>> increasing value of n, which could give us some more insights in what to do
>>>> next and how to do it.
>>>>
>>>> Since I have an intro to deep learning subject this term in college, I
>>>> waited this past two weeks to be introduced to the application of deep
>>>> learning in NLP and MTs.
>>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>>> and why to use it instead of the standard network in NLP, beside
>>>> understanding the different architectures of it and the math done in the
>>>> forward and back propagation.
>>>> Also besides knowing how to build a simple language model, and avoiding
>>>> the problem of (vanishing gradient) leading to not capturing long
>>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>>> Memory (LSTM) network.
>>>>
>>>> For next step, we will consider working only on the language model and
>>>> to let the max entropy part for later discussions.
>>>> So along with trying different n values in the n-gram language model
>>>> and evaluate the results, I will try either to use a ready RNNLM or to
>>>> build a new one from scratch from what I learnt so far. Honestly I prefer
>>>> the last choice because it will increase my experience in applying what I
>>>> have learnt.
>>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>>> implemented a character based language model as two assignments and they
>>>> were very fun to do. So implementing a RNNs word based character LM will
>>>> not take much time, though it may not be close to the state-of-the-art
>>>> model and this is the disadvantage of it.
>>>>
>>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>>> - Automatically learn such syntactic and semantic features.
>>>> - Overcome the curse of dimensionality by generating better
>>>> generalizations.
>>>>
>>>> ----------------------------------------------
>>>>
>>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>>> that different as Sevilay pointed out in our discussion.
>>>> I knew that NNLM is better than statistical one, also that using
>>>> machine learning instead of maximum entropy model will give better
>>>> performance.
>>>> *But* the evaluation results were very very disappointing, unexpected
>>>> and illogical, so I thought there might be a bug in the code.
>>>> And after some search, I found that I did a very very silly *mistake*
>>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>>> probability, then the higher in magnitude has the lower probability, but I
>>>> what I did was the inverse of that, and that was the cause of the very bad
>>>> results.
>>>>
>>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>>
>>>> Regards,
>>>> Aboelhamd
>>>>
>>>>
>>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>>> aboelhamd.abotr...@gmail.com> wrote:
>>>>
>>>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>>>
>>>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı <sevilaybaya...@gmail.com
>>>>> wrote:
>>>>>
>>>>>> hi Aboelhamd,
>>>>>>
>>>>>> Your proposal looks good, I found these resource may be will be
>>>>>> benefit.
>>>>>>
>>>>>>
>>>>>>
>>>>>> <https://arxiv.org/pdf/1601.00710>
>>>>>> Multi-source *neural translation* <https://arxiv.org/abs/1601.00710>
>>>>>> https://arxiv.org/abs/1601.00710
>>>>>>
>>>>>>
>>>>>> <https://arxiv.org/pdf/1708.05943>
>>>>>> *Neural machine translation *with extended context
>>>>>> <https://arxiv.org/abs/1708.05943>
>>>>>> https://arxiv.org/abs/1708.05943
>>>>>>
>>>>>> Handling homographs in *neural machine translation*
>>>>>> <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sevilay
>>>>>>
>>>>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>>>>> aboelhamd.abotr...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I got a not solid yet idea as an alternative to yasmet and max
>>>>>>> entropy models.
>>>>>>> And it's by using neural networks to give us scores for the
>>>>>>> ambiguous rules.
>>>>>>> But I didn't yet set a formulation for the problem nor the structure
>>>>>>> of the inputs, output and even the goal.
>>>>>>> As I think there are many formulations that we can adopt.
>>>>>>>
>>>>>>> For example, the most straightforward structure, is to give the
>>>>>>> network all the possible combinations
>>>>>>> of a sentence translations and let it choose the best one, or give
>>>>>>> them weights.
>>>>>>> Hence, make the network learns which combinations to choose for a
>>>>>>> specific pair.
>>>>>>>
>>>>>>> Another example, is instead of building one network per pair,
>>>>>>> we build one network per ambiguous pattern as we did with max
>>>>>>> entropy models.
>>>>>>> So we give to the network the combinations for that pattern,
>>>>>>> and let it assign some weights for the ambiguous rules applied to
>>>>>>> that pattern.
>>>>>>>
>>>>>>> And for each structure there are many details and questions to yet
>>>>>>> answer.
>>>>>>>
>>>>>>> So with that said, I decided to look at some papers to see what
>>>>>>> others have done before
>>>>>>> to tackle some similar problems or the exact problem, and how some
>>>>>>> of them used machine learning
>>>>>>> or deep learning to solve these problems, and then try build on them.
>>>>>>>
>>>>>>> Some papers resolution was very specific to the pairs they
>>>>>>> developed, thus were not very important to our case. :
>>>>>>> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean
>>>>>>> Machine Translation
>>>>>>> <https://www.worldscientific.com/doi/10.1142/S0219427903000887>
>>>>>>> .(2003)
>>>>>>> 2) Arabic Machine Translation: A Developmental Perspective
>>>>>>> <http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf>
>>>>>>> .(2010)
>>>>>>>
>>>>>>> Some other papers tried not to generate ambiguous rules or to
>>>>>>> minimize the ambiguity in transfer rules inference, and didn't provide 
>>>>>>> any
>>>>>>> methods to resolve the ambiguity in our case. I thought that they may
>>>>>>> provide some help, but I think they are far from our topic :
>>>>>>> 1) Learning Transfer Rules for Machine Translation with Limited Data
>>>>>>> <http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf>
>>>>>>> .(2005)
>>>>>>> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small
>>>>>>> Parallel Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009)
>>>>>>>
>>>>>>> Now I am looking into some more recent papers like :
>>>>>>> 1) Rule Based Machine Translation Combined with Statistical Post
>>>>>>> Editor for Japanese to English Patent Translation
>>>>>>> <http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007)
>>>>>>> 2) Machine translation model using inductive logic programming
>>>>>>> <https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009)
>>>>>>> 3) Machine Learning for Hybrid Machine Translation
>>>>>>> <https://www.aclweb.org/anthology/W12-3138.pdf>.(2012)
>>>>>>> 4) Study and Comparison of Rule-Based and Statistical
>>>>>>> Catalan-Spanish Machine Translation Systems
>>>>>>> <https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923>
>>>>>>> .(2012)
>>>>>>> 5) Latest trends in hybrid machine translation and its applications
>>>>>>> <https://www.sciencedirect.com/science/article/pii/S0885230814001077>
>>>>>>> .(2015)
>>>>>>> 6) Machine Translation: Phrase-Based, Rule-Based and
>>>>>>> NeuralApproaches with Linguistic Evaluation
>>>>>>> <http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017)
>>>>>>> 7) A Multitask-Based Neural Machine Translation Model with
>>>>>>> Part-of-Speech Tags Integration for Arabic Dialects
>>>>>>> <https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018)
>>>>>>>
>>>>>>> And I hope they give me some more insights and thoughts.
>>>>>>>
>>>>>>> --------------
>>>>>>>
>>>>>>> - So do you have recommendations to other papers that refer to the
>>>>>>> same problem ?
>>>>>>> - Also about the proposal, I modified it a little bit and share it
>>>>>>> through GSoC website as a draft,
>>>>>>>  so do you have any last feedback or thoughts about it, or do I just
>>>>>>> submit it as a final proposal ?
>>>>>>> - Last thing for the coding challenge ( integrating weighted
>>>>>>> transfer rules with apertium-transfer ),
>>>>>>>  I think it's finished, and I didn't get any feedback or response
>>>>>>> about it, also the pull-request is not merged yet with master.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Aboelhamd
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly <
>>>>>>> aboelhamd.abotr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Sevilay, hi spectei,
>>>>>>>>
>>>>>>>> For sentence splitting, I think that we don't need to know neither
>>>>>>>> syntax nor sentence boundaries of the language.
>>>>>>>> Also I don't see any necessity for applying it in runtime, as in
>>>>>>>> runtime we only get the score of each pattern,
>>>>>>>> where there is no need for splitting. I also had one thought on
>>>>>>>> using beam-search here as I see it has no effect
>>>>>>>> and may be I am wrong. We can discuss in it after we close this
>>>>>>>> thread.
>>>>>>>>
>>>>>>>> We will handle the whole text as one unit and will depend only on
>>>>>>>> the captured patterns.
>>>>>>>> Knowing that in the chunker terms, successive patterns that don't
>>>>>>>> share a transfer rule, are independent.
>>>>>>>> So by using the lexical form of the text, we match the words with
>>>>>>>> patterns, then match patterns with rules.
>>>>>>>> And hence we know which patterns are ambiguous and how much
>>>>>>>> ambiguous rules they match.
>>>>>>>>
>>>>>>>> For example if we have text with the following patterns and
>>>>>>>> corresponding rules numbers:
>>>>>>>> p1:2  p2:1  p3:6  p4:4  p5:3  p6:5  p7:1  p8:4  p9:4  p10:6  p11:8
>>>>>>>> p12:5  p13:5  p14:1  p15:3  p16:2
>>>>>>>>
>>>>>>>> If such text was handled by our old method with generating all the
>>>>>>>> combinations possible (multiplication of rules numbers),
>>>>>>>> we would have 82944000 possible combinations, which are not
>>>>>>>> practical at all to score, and take heavy computations and memory.
>>>>>>>> And if it is handled by our new method with applying all ambiguous
>>>>>>>> rules of one pattern while fixing the other patterns at LRLM rule
>>>>>>>> (addition of rules numbers), we will have just 60 combinations, and
>>>>>>>> not all of them different, giving drastically low number of 
>>>>>>>> combinations,
>>>>>>>> which may be not so representative.
>>>>>>>>
>>>>>>>> But if we apply the splitting idea , we will have something in the
>>>>>>>> middle, that will hopefully avoid the disadvantages of both methods
>>>>>>>> and benefit from advantages of both, too.
>>>>>>>> Let's proceed from the start of the text to the end of it, while
>>>>>>>> maintaining some threshold of say 24000 combinations.
>>>>>>>> p1 => 2  ,,  p1  p2 => 2  ,,  p1  p2  p3 => 12  ,,  p1  p2  p3  p4
>>>>>>>> => 48  ,,  p1  p2  p3  p4  p5 => 144  ,,
>>>>>>>> p1  p2  p3  p4  p5  p6 => 720  ,,  p1  p2  p3  p4  p5  p6  p7 =>
>>>>>>>> 720
>>>>>>>> p1  p2  p3  p4  p5  p6  p7 p8 => 2880  ,,  p1  p2  p3  p4  p5  p6
>>>>>>>> p7  p8  p9 => 11520
>>>>>>>>
>>>>>>>> And then we stop here, because taking the next pattern will exceed
>>>>>>>> the threshold.
>>>>>>>> Hence having our first split, we can now continue our work on it as
>>>>>>>> usual.
>>>>>>>> But with more -non overwhelming- combinations which would capture
>>>>>>>> more semantics.
>>>>>>>> After that, we take the next split and so on.
>>>>>>>>
>>>>>>>> -----------
>>>>>>>>
>>>>>>>> I agree with you, that testing the current method with more than
>>>>>>>> one pair to know its accuracy is the priority,
>>>>>>>> and we currently working on it.
>>>>>>>>
>>>>>>>> -----------
>>>>>>>>
>>>>>>>> For an alternative for yasmet, I agree with spectei. Unfortunately,
>>>>>>>> for now I don't have a solid idea to discuss.
>>>>>>>> But in the few days, i will try to get one or more ideas to discuss.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> El 2019-04-05 20:57, Sevilay Bayatlı escribió:
>>>>>>>>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió:
>>>>>>>>> >>> Hi Aboelhamd,
>>>>>>>>> >>>
>>>>>>>>> >>> There is some points in your proposal:
>>>>>>>>> >>>
>>>>>>>>> >>> First, I do not think "splitting sentence" is a good idea, each
>>>>>>>>> >>> language has different syntax, how could you know when you
>>>>>>>>> should
>>>>>>>>> >>> split the sentence.
>>>>>>>>> >>
>>>>>>>>> >> Apertium works on the concept of a stream of words, so in the
>>>>>>>>> >> runtime
>>>>>>>>> >> we can't really rely on robust sentence segmentation.
>>>>>>>>> >>
>>>>>>>>> >> We can often use it, e.g. for training, but if sentence boundary
>>>>>>>>> >> detection
>>>>>>>>> >> were to be included, it would need to be trained, as Sevilay
>>>>>>>>> hints
>>>>>>>>> >> at.
>>>>>>>>> >>
>>>>>>>>> >> Also, I'm not sure how much we would gain from that.
>>>>>>>>> >>
>>>>>>>>> >>> Second, "substitute yasmet with other method", I think the
>>>>>>>>> result
>>>>>>>>> >> will
>>>>>>>>> >>> not be more better if you substituted it with statistical
>>>>>>>>> method.
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >> Substituting yasmet with a more up to date machine-learning
>>>>>>>>> method
>>>>>>>>> >> might be a worthwhile thing to do. What suggestions do you have?
>>>>>>>>> >>
>>>>>>>>> >> I think first we have to trying the exact method with more than
>>>>>>>>> 3
>>>>>>>>> >> language pairs and then decide  to substitute it or not, because
>>>>>>>>> >> what is the point of new method if dont achieve gain, then we
>>>>>>>>> can
>>>>>>>>> >> compare  the results of two methods and choose the best one.
>>>>>>>>> What do
>>>>>>>>> >> you think?
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> Yes, testing it with more language pairs is also a priority.
>>>>>>>>>
>>>>>>>>> Fran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Apertium-stuff mailing list
>>>>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Apertium-stuff mailing list
>>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Apertium-stuff mailing list
>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>
>>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Reply via email to