Re: [Apertium-stuff] GSOC 2020 idea

Tanmai Khanna Thu, 27 Feb 2020 11:31:42 -0800

How exactly can characters predict sentiment? Don’t you still need some 
training data for pairs? English, Hindi, Bangla aren’t really low resource 
languages.


Anyway, we can continue this discussion on the IRC so that it’ll be easier and 
more people can contribute to the discussion.

Tanmai

Sent from my iPhone

> On 28-Feb-2020, at 00:52, Rajarshi Roychoudhury <rroychoudhu...@gmail.com> 
> wrote:
> 
> 
> To answer the question on how to analyse sentiment on low resource language , 
> I think character embedding would be the best option. The words in the corpus 
> is not exhaustive but the number of unique characters is certainly well 
> deterministic. We can figure out the embedding weight for each character, and 
> can apply it for a number of NLP techniques, not just sentiment analysis.The 
> downside of low resource language can be slightly minimised using that.
> 
>> On Fri, Feb 28, 2020, 00:46 Rajarshi Roychoudhury <rroychoudhu...@gmail.com> 
>> wrote:
>> As I mentioned earlier, I would like to work on English-Hindi or 
>> English-Bengali translation, the dataset can be obtained from sentiwordnet 
>> for Indian languages,
>> https://amitavadas.com/sentiwordnet.php
>> which is by far the most resourceful dataset available for sentiment 
>> analysis.It contains data for both Hindi and Bengali. 
>> 
>> I cannot give any example specific to apertium because whenever I try to 
>> translate a word from English in the interface, the available languages for 
>> translation are beyond my knowledge. I am not sure if I am right, but 
>> Hindi/Bengali is probably not one of the languages to which an English word 
>> can be translated to. Correct me if I am wrong
>> 
>> 
>> 
>>> On Fri, Feb 28, 2020, 00:31 Tanmai Khanna <khanna.tan...@gmail.com> wrote:
>>> Hi, I have a few questions about this:
>>> 1. How would you analyse the sentiment of the source text? Considering the 
>>> language pairs that Apertium deals with are low resource languages.
>>> 2. As Tino mentions, is there a problem of sentiment loss in Apertium? Any 
>>> examples of this?
>>> 3. Doesn't the sentiment analysis of a language require a decent amount of 
>>> training data? Where would this data be found for low resource languages?
>>> 
>>> Tanmai
>>> 
>>>> On Fri, Feb 28, 2020 at 12:02 AM Rajarshi Roychoudhury 
>>>> <rroychoudhu...@gmail.com> wrote:
>>>> The effect won't be very evident on simple sentences, I think it would be 
>>>> more effective on sentences where choice of words can decide the 
>>>> efficiency of translation. It's not about if "Watch out" could be " be 
>>>> careful" , it's about choosing words that can  retain the urgency in 
>>>> "watch out". Sentiment information on original sentence can help in that.
>>>> 
>>>>> On Thu, Feb 27, 2020, 23:47 Scoop Gracie <scoopgra...@gmail.com> wrote:
>>>>> So, "Watch out!" Could become "Be careful"?
>>>>> 
>>>>>> On Thu, Feb 27, 2020, 10:13 Rajarshi Roychoudhury 
>>>>>> <rroychoudhu...@gmail.com> wrote:
>>>>>> It is not just about  minimizing loss of sentiment , it is about using 
>>>>>> that information for better translation. A very trivial example would be 
>>>>>> that for some situations , sentences can project a strong sentiment and 
>>>>>> simple translation may not always yield the best result. However if we 
>>>>>> can use the knowledge of the sentiment to choose the words , it might 
>>>>>> give better result.
>>>>>> 
>>>>>> As far as the codes are concerned, I need to study the source code , or 
>>>>>> a detailed documentation for proposing a feasible solution. 
>>>>>> 
>>>>>> Best,
>>>>>> Rajarshi
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Feb 27, 2020, 23:21 Tino Didriksen <m...@tinodidriksen.com> 
>>>>>>> wrote:
>>>>>>> My first question would be, is this actually a problem for rule-based 
>>>>>>> machine translation? I am not a linguist, but given how RBMT works I 
>>>>>>> can't really see where sentiment would be lost in the process, 
>>>>>>> especially because Apertium is designed for related languages where 
>>>>>>> sentiment is mostly the same. But even for less related languages, it 
>>>>>>> would be down to the quality of the source language analysis.
>>>>>>> 
>>>>>>> Beyond that, please learn how Apertium specifically works, not just 
>>>>>>> RBMT in general. http://wiki.apertium.org/wiki/Documentation is a good 
>>>>>>> start, but our IRC channel is the best place to ask technical questions.
>>>>>>> 
>>>>>>> One major issue specific to Apertium is that the source information is 
>>>>>>> no longer available in the target generation step.
>>>>>>> 
>>>>>>> E.g., since you mention English-Hindi, you could install 
>>>>>>> apertium-eng-hin and see how each part of the pipe works. We have 
>>>>>>> precompiled binaries common platforms. Again, see wiki and IRC.
>>>>>>> 
>>>>>>> -- Tino Didriksen
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury 
>>>>>>>> <rroychoudhu...@gmail.com> wrote:
>>>>>>>> Formally i present my idea in this form:
>>>>>>>> From my understanding of RBMT ,
>>>>>>>> The RBMT system contains:
>>>>>>>> 
>>>>>>>> a SL morphological analyser - analyses a source language word and 
>>>>>>>> provides the morphological information;
>>>>>>>> a SL parser - is a syntax analyser which analyses source language 
>>>>>>>> sentences;
>>>>>>>> a translator - used to translate a source language word into the 
>>>>>>>> target language;
>>>>>>>> a TL morphological generator - works as a generator of appropriate 
>>>>>>>> target language words for the given grammatica information;
>>>>>>>> a TL parser - works as a composer of suitable target language sentences
>>>>>>>> I propose a 6th component of the RBMT system: sentiment based TL 
>>>>>>>> morphological generator
>>>>>>>> 
>>>>>>>> I propose that we do word level sentiment analysis of the source 
>>>>>>>> language and targeted language. For the time being i want to work on 
>>>>>>>> English-Hindi translation. We do not need a neural network based 
>>>>>>>> translation, however for getting the sentiment associated with each 
>>>>>>>> word we might use nltk,or develop a character level embedding to just 
>>>>>>>> find out the sentiment assosiated with each word,and form a dictionary 
>>>>>>>> out of it.I have written a paper on it,and received good results.So 
>>>>>>>> basically,during the final application development we will just have 
>>>>>>>> the dictionary,with no neural network dependencies. This  can easily 
>>>>>>>> be done with Python.I just need a good corpus of English and Hindi 
>>>>>>>> words(the sentiment datasets are available online).
>>>>>>>> 
>>>>>>>> The sentiment based TL morphological generator will generate the list 
>>>>>>>> of possible words,and we will take that word whose sentiment is 
>>>>>>>> closest to the source language word.
>>>>>>>> This is a novel method that has probably not been applied before, and 
>>>>>>>> might generate better results. 
>>>>>>>> 
>>>>>>>> Please provide your valuable feedwork and suggest some necessary 
>>>>>>>> changes that needs to be made.
>>>>>>>> Best,
>>>>>>>> Rajarshi
>>>>>>> _______________________________________________
>>>>>>> Apertium-stuff mailing list
>>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>> _______________________________________________
>>>>>> Apertium-stuff mailing list
>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>> 
>>> 
>>> -- 
>>> Khanna, Tanmai
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC 2020 idea

Reply via email to