Re: [Apertium-stuff] Fwd: Improve apertium-ambiguous

Francis Tyers Mon, 29 Apr 2019 07:02:19 -0700

El 2019-04-29 12:21, Aboelhamd Aly escribió:

---------- Forwarded message ---------
From: ABOELHAMD ALY <aboelhamd.abotr...@gmail.com>
Date: Mon, 29 Apr 2019, 03:14
Subject: Re: [Apertium-stuff] Improve apertium-ambiguous
To: <apertium-stuff@lists.sourceforge.net>


Hi all,

Our last evaluation was as follows :

Our scores are better than apertium's LRLM in WER and PER, but still
worth in Bleu.

--------------------------------

In respond to Francis thoughts :

- I agree with you in knowing the disadvantages and advantages of any
method before replacing it or trying it.

- I think that maximum entropy is not the bottleneck problem for our
model. As I understand how it works, it just chooses the best
probability distribution of some rules applying to a pattern of
lemmas, so it's a kind of an approximation for the language model.
Thus the thing we should improve is the language model, and if we had
a strong LM that is accurate and fast enough, we would not need the
max entropy model or any other substitute.


The benefit of having a source side model is that we don't have to
then perform all of the translations before scoring and selecting.

This makes the performance much better.

- For determining the role of data size, we hav8.9e here the target
corpus that train the LM and source corpus that used train the max
entropy models with ambiguous rules. So as I understand, we should
take the same percentages of both corpora to train and analyze, Is
that right ?

- For adding more features, such as tags. The same thing for max
entropy, is applied here. But if we want to add more features, could
yasmet do that, or we should use another tool ?


It's possible to do in Yasmet.

- For the semi-oracle system, our generated translations just have an
ambiguous rule applied for one pattern and LRLM rules for others, for
all ambiguous rules and for all patterns. So is it right to choose the
best translation from these "samples" or we should generate all the
combinations ? And if we are to generate them, the will get us back to
the old problem of so many combinations.


From the samples.

- For SVM or any other "supervised" machine learning algorithm, I
don't get yet how it will be applied for our problem which doesn't
have a labeled data and hence should be solved with an unsupervised
algorithm. I think with an entry in the training data-set being some
lemmas of a pattern along with the scores of the applied ambiguous
rules, we should have a model that decides which rule to choose giving
lemmas as input. May be clustering or unsupervised deep learning could
be used.


Maximum Entropy is also a supervised approach. The fractional counts
are used instead of cooccurrence counts. E.g. instead of having

  X is seen with Y 10 times,

you have

  X is seen with Y_1 0.5 times
  X is seen with Y_2 0.2 times
  X is seen with Y_3 0.3 times

- For NN and deep learning, we know that the largest the data the best
the results, and the results you mentioned in this paper [1], where
there was a comparison between NMT and SMT, showing that Quality for
NMT starts much lower than SMT, needing huge data to outperform SMT.

But this results belong to machine translations and not language
models, as I saw on these two state-of-the-art-then papers [1 [2]] and
[2 [3]], that RNNLM outperforms n-gram LM (5-gram with modified
Kneser-Ney smoothing, which is the exact one we use, implemented by
kenlm [4] tool). And it does that on all the trained corpora with all
sizes (from 200K words to 37M words), also some mixture of different
LMs gave the best results. One of the comparison tables is that one,
where Penn corpus has only 1M words.

Knowing that Turkish wiki corpus has 2.5M lines (line could contain
many sentences) with 52M words, and 3.2M unique words, which is the
vocabulary of the LM, its size is (30K - 100K) typically in commercial
uses and of course may be more than that.
Also the euro parliament English corpus has 1.9M lines with 49M words
and 300K unique words.

Also knowing that there are some great advances in RNN architectures
that solve some of the early encountered problems such as GRU and LSTM
units and bi-directional RNN.
One other thing is the word embedding (which is the input to the NN),
that captures semantic and syntactic similarity and relation between
words that helps understanding the context.
 (c) 2019, Moustafa Youssef


It should be fairly easy to check if an NN architecture is better. Just
swap out KenLM with a neural LM and run the TLM results again. (e.g.
generate everything and then score and pick the highest scoring)

The TLM should provide an upper bound on the quality of translationsthat

we can get, e.g. the idea of the system is to get the same (or close to
the same) performance of using the language model without having
to perform all the translations.

Obviously the better signal of quality from the target side the better
the results on the source side will be.

---------------------------------------------

Some unsolved issues :

- The programs have undefined behavior in ubuntu18, may be the cause
is the newer version of the compiler, because when debugged the code,
it complained about some special access of the hash-maps. Since I
don't have that great experience with c++ different versions and
compilers, I can't know the real cause, but I think with more
debugging it will be solved.


This sounds like a question for Tino, be sure to file GitHub issues
and ask him to take a look.

- Memduh Gökırmak and me had tried the program with kir-tur pair but
the results were very bad, as no rule was applied though everything
was supposed to be checked. I didn't find the bug yet, but intending
to, next week. As this pair would improve our evaluation analysis,
along with the other two pairs kaz-tur and spa-eng.


Great. Note that it also may help to add ambiguous rules where
necessary. Unfortunately I don't have time to do it right now, but if
someone could find me contrastive examples, I could certainly write
the rules. The hard part is finding the examples.

- I think we still have a problem of proper lexical selection, as
after using it, we still have ambiguous target lexical forms (which we
choose the first one) and have not appropriate lexical forms in some
sentences. So is there a way to disambiguate the lexical selection ?
The command we used is

lrx-proc -m spa-eng.autolex.bin biltrans.txt > lextor.txt


And I think that uses some automatic selection and there is another
tool LRX-COMP that can a rule file to get some weights to select with.
So is that right ? and what is this rule file ?


apertium-eng-spa.spa-eng.lrx should be a file with rules and weights.

Further rules can be trained in an unsupervised manner.

Regards,

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: Improve apertium-ambiguous

Reply via email to