Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Aboelhamd Aly Sun, 21 Apr 2019 03:37:45 -0700

Hi,

I am uploading the summary of each day of work in this wiki page
<http://wiki.apertium.org/wiki/User:Aboelhamd/progress>.
Please, take a look and let me know if there is something else I could do
instead.


Thanks.

On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <aboelhamd.abotr...@gmail.com>
wrote:

> According to the timeline I put in my proposal, I am supposed to start
> phase 1 today.
> I want to know which procedures to do to document my work, day by day and
> week by week.
> Do I create a page in wiki to save my progress ?
> Or is there another way ?
>
> Thanks
>
> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay. Hi Francis,
>>
>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>> compared to apertium LRLM resolution.
>> So we discussed what to do next and it is to utilize the breakthrough of
>> deep learning neural networks in NLP and especially machine translations.
>> Also we discussed about using different values of n more than 5 in the
>> already used n-gram language model. And to evaluate the result of
>> increasing value of n, which could give us some more insights in what to do
>> next and how to do it.
>>
>> Since I have an intro to deep learning subject this term in college, I
>> waited this past two weeks to be introduced to the application of deep
>> learning in NLP and MTs.
>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>> and why to use it instead of the standard network in NLP, beside
>> understanding the different architectures of it and the math done in the
>> forward and back propagation.
>> Also besides knowing how to build a simple language model, and avoiding
>> the problem of (vanishing gradient) leading to not capturing long
>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>> Memory (LSTM) network.
>>
>> For next step, we will consider working only on the language model and to
>> let the max entropy part for later discussions.
>> So along with trying different n values in the n-gram language model and
>> evaluate the results, I will try either to use a ready RNNLM or to build a
>> new one from scratch from what I learnt so far. Honestly I prefer the last
>> choice because it will increase my experience in applying what I have
>> learnt.
>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>> implemented a character based language model as two assignments and they
>> were very fun to do. So implementing a RNNs word based character LM will
>> not take much time, though it may not be close to the state-of-the-art
>> model and this is the disadvantage of it.
>>
>> Using NNLM instead of the n-gram LM has these possible advantages :
>> - Automatically learn such syntactic and semantic features.
>> - Overcome the curse of dimensionality by generating better
>> generalizations.
>>
>> ----------------------------------------------
>>
>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>> that different as Sevilay pointed out in our discussion.
>> I knew that NNLM is better than statistical one, also that using machine
>> learning instead of maximum entropy model will give better performance.
>> *But* the evaluation results were very very disappointing, unexpected
>> and illogical, so I thought there might be a bug in the code.
>> And after some search, I found that I did a very very silly *mistake* in
>> normalizing the LM scores. As the scores are log base 10 of the sentence
>> probability, then the higher in magnitude has the lower probability, but I
>> what I did was the inverse of that, and that was the cause of the very bad
>> results.
>>
>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>
>> Regards,
>> Aboelhamd
>>
>>
>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>
>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı <sevilaybaya...@gmail.com
>>> wrote:
>>>
>>>> hi Aboelhamd,
>>>>
>>>> Your proposal looks good, I found these resource may be will be benefit.
>>>>
>>>>
>>>>
>>>> <https://arxiv.org/pdf/1601.00710>
>>>> Multi-source *neural translation* <https://arxiv.org/abs/1601.00710>
>>>> https://arxiv.org/abs/1601.00710
>>>>
>>>>
>>>> <https://arxiv.org/pdf/1708.05943>
>>>> *Neural machine translation *with extended context
>>>> <https://arxiv.org/abs/1708.05943>
>>>> https://arxiv.org/abs/1708.05943
>>>>
>>>> Handling homographs in *neural machine translation*
>>>> <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510
>>>>
>>>>
>>>>
>>>> Sevilay
>>>>
>>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>>> aboelhamd.abotr...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I got a not solid yet idea as an alternative to yasmet and max entropy
>>>>> models.
>>>>> And it's by using neural networks to give us scores for the ambiguous
>>>>> rules.
>>>>> But I didn't yet set a formulation for the problem nor the structure
>>>>> of the inputs, output and even the goal.
>>>>> As I think there are many formulations that we can adopt.
>>>>>
>>>>> For example, the most straightforward structure, is to give the
>>>>> network all the possible combinations
>>>>> of a sentence translations and let it choose the best one, or give
>>>>> them weights.
>>>>> Hence, make the network learns which combinations to choose for a
>>>>> specific pair.
>>>>>
>>>>> Another example, is instead of building one network per pair,
>>>>> we build one network per ambiguous pattern as we did with max entropy
>>>>> models.
>>>>> So we give to the network the combinations for that pattern,
>>>>> and let it assign some weights for the ambiguous rules applied to that
>>>>> pattern.
>>>>>
>>>>> And for each structure there are many details and questions to yet
>>>>> answer.
>>>>>
>>>>> So with that said, I decided to look at some papers to see what others
>>>>> have done before
>>>>> to tackle some similar problems or the exact problem, and how some of
>>>>> them used machine learning
>>>>> or deep learning to solve these problems, and then try build on them.
>>>>>
>>>>> Some papers resolution was very specific to the pairs they developed,
>>>>> thus were not very important to our case. :
>>>>> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean
>>>>> Machine Translation
>>>>> <https://www.worldscientific.com/doi/10.1142/S0219427903000887>.(2003)
>>>>> 2) Arabic Machine Translation: A Developmental Perspective
>>>>> <http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf>
>>>>> .(2010)
>>>>>
>>>>> Some other papers tried not to generate ambiguous rules or to minimize
>>>>> the ambiguity in transfer rules inference, and didn't provide any methods
>>>>> to resolve the ambiguity in our case. I thought that they may provide some
>>>>> help, but I think they are far from our topic :
>>>>> 1) Learning Transfer Rules for Machine Translation with Limited Data
>>>>> <http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf>
>>>>> .(2005)
>>>>> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small
>>>>> Parallel Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009)
>>>>>
>>>>> Now I am looking into some more recent papers like :
>>>>> 1) Rule Based Machine Translation Combined with Statistical Post
>>>>> Editor for Japanese to English Patent Translation
>>>>> <http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007)
>>>>> 2) Machine translation model using inductive logic programming
>>>>> <https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009)
>>>>> 3) Machine Learning for Hybrid Machine Translation
>>>>> <https://www.aclweb.org/anthology/W12-3138.pdf>.(2012)
>>>>> 4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish
>>>>> Machine Translation Systems
>>>>> <https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923>
>>>>> .(2012)
>>>>> 5) Latest trends in hybrid machine translation and its applications
>>>>> <https://www.sciencedirect.com/science/article/pii/S0885230814001077>
>>>>> .(2015)
>>>>> 6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches
>>>>> with Linguistic Evaluation
>>>>> <http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017)
>>>>> 7) A Multitask-Based Neural Machine Translation Model with
>>>>> Part-of-Speech Tags Integration for Arabic Dialects
>>>>> <https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018)
>>>>>
>>>>> And I hope they give me some more insights and thoughts.
>>>>>
>>>>> --------------
>>>>>
>>>>> - So do you have recommendations to other papers that refer to the
>>>>> same problem ?
>>>>> - Also about the proposal, I modified it a little bit and share it
>>>>> through GSoC website as a draft,
>>>>>  so do you have any last feedback or thoughts about it, or do I just
>>>>> submit it as a final proposal ?
>>>>> - Last thing for the coding challenge ( integrating weighted transfer
>>>>> rules with apertium-transfer ),
>>>>>  I think it's finished, and I didn't get any feedback or response
>>>>> about it, also the pull-request is not merged yet with master.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Aboelhamd
>>>>>
>>>>>
>>>>> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly <
>>>>> aboelhamd.abotr...@gmail.com> wrote:
>>>>>
>>>>>> Hi Sevilay, hi spectei,
>>>>>>
>>>>>> For sentence splitting, I think that we don't need to know neither
>>>>>> syntax nor sentence boundaries of the language.
>>>>>> Also I don't see any necessity for applying it in runtime, as in
>>>>>> runtime we only get the score of each pattern,
>>>>>> where there is no need for splitting. I also had one thought on using
>>>>>> beam-search here as I see it has no effect
>>>>>> and may be I am wrong. We can discuss in it after we close this
>>>>>> thread.
>>>>>>
>>>>>> We will handle the whole text as one unit and will depend only on the
>>>>>> captured patterns.
>>>>>> Knowing that in the chunker terms, successive patterns that don't
>>>>>> share a transfer rule, are independent.
>>>>>> So by using the lexical form of the text, we match the words with
>>>>>> patterns, then match patterns with rules.
>>>>>> And hence we know which patterns are ambiguous and how much ambiguous
>>>>>> rules they match.
>>>>>>
>>>>>> For example if we have text with the following patterns and
>>>>>> corresponding rules numbers:
>>>>>> p1:2  p2:1  p3:6  p4:4  p5:3  p6:5  p7:1  p8:4  p9:4  p10:6  p11:8
>>>>>> p12:5  p13:5  p14:1  p15:3  p16:2
>>>>>>
>>>>>> If such text was handled by our old method with generating all the
>>>>>> combinations possible (multiplication of rules numbers),
>>>>>> we would have 82944000 possible combinations, which are not practical
>>>>>> at all to score, and take heavy computations and memory.
>>>>>> And if it is handled by our new method with applying all ambiguous
>>>>>> rules of one pattern while fixing the other patterns at LRLM rule
>>>>>> (addition of rules numbers), we will have just 60 combinations, and
>>>>>> not all of them different, giving drastically low number of combinations,
>>>>>> which may be not so representative.
>>>>>>
>>>>>> But if we apply the splitting idea , we will have something in the
>>>>>> middle, that will hopefully avoid the disadvantages of both methods
>>>>>> and benefit from advantages of both, too.
>>>>>> Let's proceed from the start of the text to the end of it, while
>>>>>> maintaining some threshold of say 24000 combinations.
>>>>>> p1 => 2  ,,  p1  p2 => 2  ,,  p1  p2  p3 => 12  ,,  p1  p2  p3  p4 =>
>>>>>> 48  ,,  p1  p2  p3  p4  p5 => 144  ,,
>>>>>> p1  p2  p3  p4  p5  p6 => 720  ,,  p1  p2  p3  p4  p5  p6  p7 => 720
>>>>>> p1  p2  p3  p4  p5  p6  p7 p8 => 2880  ,,  p1  p2  p3  p4  p5  p6
>>>>>> p7  p8  p9 => 11520
>>>>>>
>>>>>> And then we stop here, because taking the next pattern will exceed
>>>>>> the threshold.
>>>>>> Hence having our first split, we can now continue our work on it as
>>>>>> usual.
>>>>>> But with more -non overwhelming- combinations which would capture
>>>>>> more semantics.
>>>>>> After that, we take the next split and so on.
>>>>>>
>>>>>> -----------
>>>>>>
>>>>>> I agree with you, that testing the current method with more than one
>>>>>> pair to know its accuracy is the priority,
>>>>>> and we currently working on it.
>>>>>>
>>>>>> -----------
>>>>>>
>>>>>> For an alternative for yasmet, I agree with spectei. Unfortunately,
>>>>>> for now I don't have a solid idea to discuss.
>>>>>> But in the few days, i will try to get one or more ideas to discuss.
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com>
>>>>>> wrote:
>>>>>>
>>>>>>> El 2019-04-05 20:57, Sevilay Bayatlı escribió:
>>>>>>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió:
>>>>>>> >>> Hi Aboelhamd,
>>>>>>> >>>
>>>>>>> >>> There is some points in your proposal:
>>>>>>> >>>
>>>>>>> >>> First, I do not think "splitting sentence" is a good idea, each
>>>>>>> >>> language has different syntax, how could you know when you should
>>>>>>> >>> split the sentence.
>>>>>>> >>
>>>>>>> >> Apertium works on the concept of a stream of words, so in the
>>>>>>> >> runtime
>>>>>>> >> we can't really rely on robust sentence segmentation.
>>>>>>> >>
>>>>>>> >> We can often use it, e.g. for training, but if sentence boundary
>>>>>>> >> detection
>>>>>>> >> were to be included, it would need to be trained, as Sevilay hints
>>>>>>> >> at.
>>>>>>> >>
>>>>>>> >> Also, I'm not sure how much we would gain from that.
>>>>>>> >>
>>>>>>> >>> Second, "substitute yasmet with other method", I think the result
>>>>>>> >> will
>>>>>>> >>> not be more better if you substituted it with statistical method.
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >> Substituting yasmet with a more up to date machine-learning method
>>>>>>> >> might be a worthwhile thing to do. What suggestions do you have?
>>>>>>> >>
>>>>>>> >> I think first we have to trying the exact method with more than 3
>>>>>>> >> language pairs and then decide  to substitute it or not, because
>>>>>>> >> what is the point of new method if dont achieve gain, then we can
>>>>>>> >> compare  the results of two methods and choose the best one. What
>>>>>>> do
>>>>>>> >> you think?
>>>>>>> >
>>>>>>>
>>>>>>> Yes, testing it with more language pairs is also a priority.
>>>>>>>
>>>>>>> Fran
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Apertium-stuff mailing list
>>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>>
>>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Reply via email to