Re: [Apertium-stuff] GSOC 2020 idea

Tanmai Khanna Thu, 27 Feb 2020 11:01:26 -0800

Hi, I have a few questions about this:
1. How would you analyse the sentiment of the source text? Considering the
language pairs that Apertium deals with are low resource languages.
2. As Tino mentions, is there a problem of sentiment loss in Apertium? Any
examples of this?
3. Doesn't the sentiment analysis of a language require a decent amount of
training data? Where would this data be found for low resource languages?


Tanmai

On Fri, Feb 28, 2020 at 12:02 AM Rajarshi Roychoudhury <
rroychoudhu...@gmail.com> wrote:

> The effect won't be very evident on simple sentences, I think it would be
> more effective on sentences where choice of words can decide the efficiency
> of translation. It's not about if "Watch out" could be " be careful" , it's
> about choosing words that can  retain the urgency in "watch out". Sentiment
> information on original sentence can help in that.
>
> On Thu, Feb 27, 2020, 23:47 Scoop Gracie <scoopgra...@gmail.com> wrote:
>
>> So, "Watch out!" Could become "Be careful"?
>>
>> On Thu, Feb 27, 2020, 10:13 Rajarshi Roychoudhury <
>> rroychoudhu...@gmail.com> wrote:
>>
>>> It is not just about  minimizing loss of sentiment , it is about using
>>> that information for better translation. A very trivial example would be
>>> that for some situations , sentences can project a strong sentiment and
>>> simple translation may not always yield the best result. However if we can
>>> use the knowledge of the sentiment to choose the words , it might give
>>> better result.
>>>
>>> As far as the codes are concerned, I need to study the source code , or
>>> a detailed documentation for proposing a feasible solution.
>>>
>>> Best,
>>> Rajarshi
>>>
>>>
>>>
>>> On Thu, Feb 27, 2020, 23:21 Tino Didriksen <m...@tinodidriksen.com>
>>> wrote:
>>>
>>>> My first question would be, is this actually a problem for rule-based
>>>> machine translation? I am not a linguist, but given how RBMT works I can't
>>>> really see where sentiment would be lost in the process, especially
>>>> because Apertium is designed for related languages where sentiment is
>>>> mostly the same. But even for less related languages, it would be down to
>>>> the quality of the source language analysis.
>>>>
>>>> Beyond that, please learn how Apertium specifically works, not just
>>>> RBMT in general. http://wiki.apertium.org/wiki/Documentation is a good
>>>> start, but our IRC channel is the best place to ask technical questions.
>>>>
>>>> One major issue specific to Apertium is that the source information is
>>>> no longer available in the target generation step.
>>>>
>>>> E.g., since you mention English-Hindi, you could install
>>>> apertium-eng-hin and see how each part of the pipe works. We have
>>>> precompiled binaries common platforms. Again, see wiki and IRC.
>>>>
>>>> -- Tino Didriksen
>>>>
>>>>
>>>> On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury <
>>>> rroychoudhu...@gmail.com> wrote:
>>>>
>>>>> Formally i present my idea in this form:
>>>>> From my understanding of RBMT ,
>>>>>
>>>>> The RBMT system contains:
>>>>>
>>>>>    - a *SL morphological analyser* - analyses a source language word
>>>>>    and provides the morphological information;
>>>>>    - a *SL parser* - is a syntax analyser which analyses source
>>>>>    language sentences;
>>>>>    - a *translator* - used to translate a source language word into
>>>>>    the target language;
>>>>>    - a *TL morphological generator* - works as a generator of
>>>>>    appropriate target language words for the given grammatica information;
>>>>>    - a *TL parser* - works as a composer of suitable target language
>>>>>    sentences
>>>>>
>>>>> I propose a 6th component of the RBMT system: *sentiment based TL
>>>>> morphological generator*
>>>>>
>>>>> I propose that we do word level sentiment analysis of the source
>>>>> language and targeted language. For the time being i want to work on
>>>>> English-Hindi translation. We do not need a neural network based
>>>>> translation, however for getting the sentiment associated with each word 
>>>>> we
>>>>> might use nltk,or develop a character level embedding to just find out the
>>>>> sentiment assosiated with each word,and form a dictionary out of it.I have
>>>>> written a paper on it,and received good results.So basically,during the
>>>>> final application development we will just have the dictionary,with no
>>>>> neural network dependencies. This  can easily be done with Python.I just
>>>>> need a good corpus of English and Hindi words(the sentiment datasets are
>>>>> available online).
>>>>>
>>>>> The *sentiment based TL morphological generator *will generate the
>>>>> list of possible words,and we will take that word whose sentiment is
>>>>> closest to the source language word.
>>>>> This is a novel method that has probably not been applied before, and
>>>>> might generate better results.
>>>>>
>>>>> Please provide your valuable feedwork and suggest some necessary
>>>>> changes that needs to be made.
>>>>> Best,
>>>>> Rajarshi
>>>>>
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC 2020 idea

Reply via email to