Re: [Apertium-stuff] GSOC 2020 idea

Tino Didriksen Thu, 27 Feb 2020 09:51:22 -0800

My first question would be, is this actually a problem for rule-based
machine translation? I am not a linguist, but given how RBMT works I can't
really see where sentiment would be lost in the process, especially
because Apertium is designed for related languages where sentiment is
mostly the same. But even for less related languages, it would be down to
the quality of the source language analysis.


Beyond that, please learn how Apertium specifically works, not just RBMT in
general. http://wiki.apertium.org/wiki/Documentation is a good start, but
our IRC channel is the best place to ask technical questions.

One major issue specific to Apertium is that the source information is no
longer available in the target generation step.

E.g., since you mention English-Hindi, you could install apertium-eng-hin
and see how each part of the pipe works. We have precompiled binaries
common platforms. Again, see wiki and IRC.

-- Tino Didriksen


On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury <
rroychoudhu...@gmail.com> wrote:

> Formally i present my idea in this form:
> From my understanding of RBMT ,
>
> The RBMT system contains:
>
>    - a *SL morphological analyser* - analyses a source language word and
>    provides the morphological information;
>    - a *SL parser* - is a syntax analyser which analyses source language
>    sentences;
>    - a *translator* - used to translate a source language word into the
>    target language;
>    - a *TL morphological generator* - works as a generator of appropriate
>    target language words for the given grammatica information;
>    - a *TL parser* - works as a composer of suitable target language
>    sentences
>
> I propose a 6th component of the RBMT system: *sentiment based TL
> morphological generator*
>
> I propose that we do word level sentiment analysis of the source language
> and targeted language. For the time being i want to work on English-Hindi
> translation. We do not need a neural network based translation, however for
> getting the sentiment associated with each word we might use nltk,or
> develop a character level embedding to just find out the sentiment
> assosiated with each word,and form a dictionary out of it.I have written a
> paper on it,and received good results.So basically,during the final
> application development we will just have the dictionary,with no neural
> network dependencies. This  can easily be done with Python.I just need a
> good corpus of English and Hindi words(the sentiment datasets are available
> online).
>
> The *sentiment based TL morphological generator *will generate the list
> of possible words,and we will take that word whose sentiment is closest to
> the source language word.
> This is a novel method that has probably not been applied before, and
> might generate better results.
>
> Please provide your valuable feedwork and suggest some necessary changes
> that needs to be made.
> Best,
> Rajarshi
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC 2020 idea

Reply via email to