Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Aboelhamd Aly Sun, 07 Apr 2019 09:14:55 -0700

Hi all,

I got a not solid yet idea as an alternative to yasmet and max entropy
models.
And it's by using neural networks to give us scores for the ambiguous rules.
But I didn't yet set a formulation for the problem nor the structure of the
inputs, output and even the goal.
As I think there are many formulations that we can adopt.


For example, the most straightforward structure, is to give the network all
the possible combinations
of a sentence translations and let it choose the best one, or give them
weights.
Hence, make the network learns which combinations to choose for a specific
pair.

Another example, is instead of building one network per pair,
we build one network per ambiguous pattern as we did with max entropy
models.
So we give to the network the combinations for that pattern,
and let it assign some weights for the ambiguous rules applied to that
pattern.

And for each structure there are many details and questions to yet answer.

So with that said, I decided to look at some papers to see what others have
done before
to tackle some similar problems or the exact problem, and how some of them
used machine learning
or deep learning to solve these problems, and then try build on them.

Some papers resolution was very specific to the pairs they developed, thus
were not very important to our case. :
1) Resolving Structural Transfer Ambiguity inChinese-to-Korean Machine
Translation <https://www.worldscientific.com/doi/10.1142/S0219427903000887>
.(2003)
2) Arabic Machine Translation: A Developmental Perspective
<http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf>
.(2010)

Some other papers tried not to generate ambiguous rules or to minimize the
ambiguity in transfer rules inference, and didn't provide any methods to
resolve the ambiguity in our case. I thought that they may provide some
help, but I think they are far from our topic :
1) Learning Transfer Rules for Machine Translation with Limited Data
<http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf>.(2005)
2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small Parallel
Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009)

Now I am looking into some more recent papers like :
1) Rule Based Machine Translation Combined with Statistical Post Editor for
Japanese to English Patent Translation
<http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007)
2) Machine translation model using inductive logic programming
<https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009)
3) Machine Learning for Hybrid Machine Translation
<https://www.aclweb.org/anthology/W12-3138.pdf>.(2012)
4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish
Machine Translation Systems
<https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923>
.(2012)
5) Latest trends in hybrid machine translation and its applications
<https://www.sciencedirect.com/science/article/pii/S0885230814001077>.(2015)
6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches with
Linguistic Evaluation
<http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017)
7) A Multitask-Based Neural Machine Translation Model with Part-of-Speech
Tags Integration for Arabic Dialects
<https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018)

And I hope they give me some more insights and thoughts.

--------------

- So do you have recommendations to other papers that refer to the same
problem ?
- Also about the proposal, I modified it a little bit and share it through
GSoC website as a draft,
 so do you have any last feedback or thoughts about it, or do I just submit
it as a final proposal ?
- Last thing for the coding challenge ( integrating weighted transfer rules
with apertium-transfer ),
 I think it's finished, and I didn't get any feedback or response about it,
also the pull-request is not merged yet with master.


Thanks,
Aboelhamd


On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly <aboelhamd.abotr...@gmail.com>
wrote:

> Hi Sevilay, hi spectei,
>
> For sentence splitting, I think that we don't need to know neither syntax
> nor sentence boundaries of the language.
> Also I don't see any necessity for applying it in runtime, as in runtime
> we only get the score of each pattern,
> where there is no need for splitting. I also had one thought on using
> beam-search here as I see it has no effect
> and may be I am wrong. We can discuss in it after we close this thread.
>
> We will handle the whole text as one unit and will depend only on the
> captured patterns.
> Knowing that in the chunker terms, successive patterns that don't share a
> transfer rule, are independent.
> So by using the lexical form of the text, we match the words with
> patterns, then match patterns with rules.
> And hence we know which patterns are ambiguous and how much ambiguous
> rules they match.
>
> For example if we have text with the following patterns and corresponding
> rules numbers:
> p1:2  p2:1  p3:6  p4:4  p5:3  p6:5  p7:1  p8:4  p9:4  p10:6  p11:8  p12:5
> p13:5  p14:1  p15:3  p16:2
>
> If such text was handled by our old method with generating all the
> combinations possible (multiplication of rules numbers),
> we would have 82944000 possible combinations, which are not practical at
> all to score, and take heavy computations and memory.
> And if it is handled by our new method with applying all ambiguous rules
> of one pattern while fixing the other patterns at LRLM rule
> (addition of rules numbers), we will have just 60 combinations, and not
> all of them different, giving drastically low number of combinations,
> which may be not so representative.
>
> But if we apply the splitting idea , we will have something in the middle,
> that will hopefully avoid the disadvantages of both methods
> and benefit from advantages of both, too.
> Let's proceed from the start of the text to the end of it, while
> maintaining some threshold of say 24000 combinations.
> p1 => 2  ,,  p1  p2 => 2  ,,  p1  p2  p3 => 12  ,,  p1  p2  p3  p4 => 48
> ,,  p1  p2  p3  p4  p5 => 144  ,,
> p1  p2  p3  p4  p5  p6 => 720  ,,  p1  p2  p3  p4  p5  p6  p7 => 720
> p1  p2  p3  p4  p5  p6  p7 p8 => 2880  ,,  p1  p2  p3  p4  p5  p6  p7  p8
> p9 => 11520
>
> And then we stop here, because taking the next pattern will exceed the
> threshold.
> Hence having our first split, we can now continue our work on it as usual.
> But with more -non overwhelming- combinations which would capture more
> semantics.
> After that, we take the next split and so on.
>
> -----------
>
> I agree with you, that testing the current method with more than one pair
> to know its accuracy is the priority,
> and we currently working on it.
>
> -----------
>
> For an alternative for yasmet, I agree with spectei. Unfortunately, for
> now I don't have a solid idea to discuss.
> But in the few days, i will try to get one or more ideas to discuss.
>
>
> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com> wrote:
>
>> El 2019-04-05 20:57, Sevilay Bayatlı escribió:
>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com> wrote:
>> >
>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió:
>> >>> Hi Aboelhamd,
>> >>>
>> >>> There is some points in your proposal:
>> >>>
>> >>> First, I do not think "splitting sentence" is a good idea, each
>> >>> language has different syntax, how could you know when you should
>> >>> split the sentence.
>> >>
>> >> Apertium works on the concept of a stream of words, so in the
>> >> runtime
>> >> we can't really rely on robust sentence segmentation.
>> >>
>> >> We can often use it, e.g. for training, but if sentence boundary
>> >> detection
>> >> were to be included, it would need to be trained, as Sevilay hints
>> >> at.
>> >>
>> >> Also, I'm not sure how much we would gain from that.
>> >>
>> >>> Second, "substitute yasmet with other method", I think the result
>> >> will
>> >>> not be more better if you substituted it with statistical method.
>> >>>
>> >>
>> >> Substituting yasmet with a more up to date machine-learning method
>> >> might be a worthwhile thing to do. What suggestions do you have?
>> >>
>> >> I think first we have to trying the exact method with more than 3
>> >> language pairs and then decide  to substitute it or not, because
>> >> what is the point of new method if dont achieve gain, then we can
>> >> compare  the results of two methods and choose the best one. What do
>> >> you think?
>> >
>>
>> Yes, testing it with more language pairs is also a priority.
>>
>> Fran
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

Reply via email to