Re: [Architecture] Adding RNN to WSO2 Machine Learner

Thamali Wijewardhana Thu, 31 Mar 2016 21:42:53 -0700

Hi,
I have organized a review on Monday (4th  of April).

Thanks


On Thu, Mar 31, 2016 at 3:21 PM, Srinath Perera <[email protected]> wrote:

> Please setup a review. Shall we do it monday?
>
> On Thu, Mar 31, 2016 at 2:15 PM, Thamali Wijewardhana <[email protected]>
> wrote:
>
>> Hi,
>>
>> we have created a spark program to prove the feasibility of adding the
>> RNN algorithm to machine learner.
>> This program demonstrates all the steps in machine learner:
>>
>> Uploading a dataset
>>
>> Selecting the hyper parameters for the model
>>
>> Creating a RNN model using data and training the model
>>
>> Calculating the accuracy of the model
>>
>> Saving the model(As a serialization object)
>>
>> predicting using the model
>>
>> This program is based on deeplearning4j and apache spark pipeline.
>> Deeplearning4j was used as the deep learning library for recurrent neural
>> network algorithm. As the program should be based on the Spark pipeline,
>> the main challenge was to use deeplearning4j library with spark pipeline.
>> The components used in the spark pipeline should be compatible with spark
>> pipeline. For other components which are not compatible with spark
>> pipeline, we have to wrap them with a org.apache.spark.predictionModel
>> object.
>>
>> We have designed a pipeline with sequence of stages (transformers and
>> estimators):
>>
>> 1. Tokenizer:Transformer-Split each sequential data to tokens.(For
>> example, in sentiment analysis, split text into words)
>>
>> 2. Vectorizer :Transformer-Transforms features into vectors.
>>
>> 3. RNN algorithm :Estimator -RNN algorithm which trains on a data frame
>> and produces a RNN model
>>
>> 4. RNN model : Transformer- Transforms data frame with features to data
>> frame with predictions.
>>
>> The diagrams below explains the stages of the pipeline. The first diagram
>> illustrates the training usage of the pipeline and the next diagram
>> illustrates the testing and predicting usage of a pipeline.
>>
>>
>> 
>>
>>
>> 
>>
>>
>> I also have tuned the RNN model for hyper parameters[1] and found the
>> values of hyper parameters which optimizes accuracy of the model.
>> Give below is the set of hyper parameters relevant to RNN algorithm and
>> the tuned values.
>>
>>
>> Number of epochs-10
>>
>> Number of iterations- 1
>>
>> Learning rate-0.02
>>
>> We used the aclImdb sentiment analysis data set for this program and with
>> the above hyper parameters, we could achieve 60% accuracy. And we are
>> trying to improve the accuracy and efficiency of our algorithm.
>>
>> [1]
>> https://docs.google.com/spreadsheets/d/1Wcta6i2k4Je_5l16wCVlH6zBMNGIb-d7USaWdbrkrSw/edit?ts=56fcdc9b#gid=2118685173
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, Mar 25, 2016 at 10:18 AM, Thamali Wijewardhana <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> One of the most important obstacles in machine learning and deep
>>> learning is getting data into a format that neural nets can understand.
>>> Neural nets understand vectors. Therefore, vectorization is an important
>>> part in building neural network algorithms.
>>>
>>> Canova is a Vectorization library for Machine Learning which is
>>> associated with deeplearning4j library. It is designed to support all major
>>> types of input data such as text,csv,image,audio,video and etc.
>>>
>>> In our project to add RNN for Machine Learner, we have to use a
>>> vectorizing component to convert input data to vectors. I think that Canova
>>> is a better to build a generic vectorizing component. I am researching on
>>> using Canova for the vectorizing purpose.
>>>
>>> Any suggestions on this are highly appreciated.
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana <[email protected]>
>>> wrote:
>>>
>>>> Hi Srinath,
>>>>
>>>> We have decided to  implement only classification first. Once we
>>>> complete the classification, we hope to do next value prediction too.
>>>> We are basically trying to implement a program to make sure that the
>>>> deeplearning4j library we are using is compatible with apache spark
>>>> pipeline. And also we are trying to demonstrate all the machine learning
>>>> steps with that program.
>>>>
>>>> We are now using aclImdb sentiment analysis data set to verify the
>>>> accuracy of the RNN model we create.
>>>>
>>>> Thanks
>>>> Thamali
>>>>
>>>>
>>>> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Thamali,
>>>>>
>>>>>
>>>>>    1. RNN can do both classification and predict next value. Are we
>>>>>    trying to do both?
>>>>>    2. When Upul played with it, he had trouble getting deeplearning4j
>>>>>    implementation work with predict next value scenario. Is it fixed?
>>>>>    3. What are the data sets we will use to verify the accuracy of
>>>>>    RNN after integration?
>>>>>
>>>>>
>>>>> --Srinath
>>>>>
>>>>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Currently we are working on a project to add Recurrent Neural
>>>>>> Network(RNN) algorithm to machine learner. RNN is one of deep learning
>>>>>> algorithms with record breaking accuracy. For more information on RNN
>>>>>> please refer link[1].
>>>>>>
>>>>>> We have decided to use deeplearning4j which is an open source deep
>>>>>> learning library scalable on spark and Hadoop.
>>>>>>
>>>>>> Since there is a plan to add spark pipeline to machine Learner, we
>>>>>> have decided to use spark pipeline concept to our project.
>>>>>>
>>>>>> I have designed an architecture for the RNN implementation.
>>>>>>
>>>>>> This architecture is developed to be compatible with spark pipeline.
>>>>>>
>>>>>> Data set is taken in csv format and then it is converted to spark
>>>>>> data frame since apache spark works mostly with data frames.
>>>>>>
>>>>>> Next step is a transformer which is needed to tokenize the sequential
>>>>>> data. A tokenizer is basically used for take a sequence of data and break
>>>>>> it into individual units. For example, it can be used to break the words 
>>>>>> in
>>>>>> a sentence to words.
>>>>>>
>>>>>> Next step is again a transformer used to converts tokens to vectors.
>>>>>> This must be done because the features should be added to spark pipeline 
>>>>>> in
>>>>>> org.apache.spark.mllib.linlag.VectorUDT format.
>>>>>>
>>>>>> Next, the transformed data are fed to the data set iterator. This is
>>>>>> an object of a class which implement
>>>>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset 
>>>>>> iterator
>>>>>> traverses through a data set and prepares data for neural networks.
>>>>>>
>>>>>> Next component is the RNN algorithm model which is an estimator. The
>>>>>> iterated data from data set iterator is fed to RNN and a model is
>>>>>> generated. Then this model can be used for predictions.
>>>>>>
>>>>>> We have decided to complete this project in two steps :
>>>>>>
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    First create a spark pipeline program containing the steps in
>>>>>>    machine learner(uploading dataset, generate model, calculating 
>>>>>> accuracy and
>>>>>>    prediction) and check whether the project is feasible.
>>>>>>    -
>>>>>>
>>>>>>    Next add the algorithm to ML
>>>>>>
>>>>>> Currently we have almost completed the first step and now we are
>>>>>> collecting more data and tuning for hyper parameters.
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ============================
>>>>> Srinath Perera, Ph.D.
>>>>>    http://people.apache.org/~hemapani/
>>>>>    http://srinathsview.blogspot.com/
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://home.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Adding RNN to WSO2 Machine Learner

Reply via email to