Hi, I have organized a review on Monday (4th of April). Thanks
On Thu, Mar 31, 2016 at 3:21 PM, Srinath Perera <[email protected]> wrote: > Please setup a review. Shall we do it monday? > > On Thu, Mar 31, 2016 at 2:15 PM, Thamali Wijewardhana <[email protected]> > wrote: > >> Hi, >> >> we have created a spark program to prove the feasibility of adding the >> RNN algorithm to machine learner. >> This program demonstrates all the steps in machine learner: >> >> Uploading a dataset >> >> Selecting the hyper parameters for the model >> >> Creating a RNN model using data and training the model >> >> Calculating the accuracy of the model >> >> Saving the model(As a serialization object) >> >> predicting using the model >> >> This program is based on deeplearning4j and apache spark pipeline. >> Deeplearning4j was used as the deep learning library for recurrent neural >> network algorithm. As the program should be based on the Spark pipeline, >> the main challenge was to use deeplearning4j library with spark pipeline. >> The components used in the spark pipeline should be compatible with spark >> pipeline. For other components which are not compatible with spark >> pipeline, we have to wrap them with a org.apache.spark.predictionModel >> object. >> >> We have designed a pipeline with sequence of stages (transformers and >> estimators): >> >> 1. Tokenizer:Transformer-Split each sequential data to tokens.(For >> example, in sentiment analysis, split text into words) >> >> 2. Vectorizer :Transformer-Transforms features into vectors. >> >> 3. RNN algorithm :Estimator -RNN algorithm which trains on a data frame >> and produces a RNN model >> >> 4. RNN model : Transformer- Transforms data frame with features to data >> frame with predictions. >> >> The diagrams below explains the stages of the pipeline. The first diagram >> illustrates the training usage of the pipeline and the next diagram >> illustrates the testing and predicting usage of a pipeline. >> >> >> >> >> >> >> >> >> I also have tuned the RNN model for hyper parameters[1] and found the >> values of hyper parameters which optimizes accuracy of the model. >> Give below is the set of hyper parameters relevant to RNN algorithm and >> the tuned values. >> >> >> Number of epochs-10 >> >> Number of iterations- 1 >> >> Learning rate-0.02 >> >> We used the aclImdb sentiment analysis data set for this program and with >> the above hyper parameters, we could achieve 60% accuracy. And we are >> trying to improve the accuracy and efficiency of our algorithm. >> >> [1] >> https://docs.google.com/spreadsheets/d/1Wcta6i2k4Je_5l16wCVlH6zBMNGIb-d7USaWdbrkrSw/edit?ts=56fcdc9b#gid=2118685173 >> >> >> Thanks >> >> >> >> On Fri, Mar 25, 2016 at 10:18 AM, Thamali Wijewardhana <[email protected]> >> wrote: >> >>> Hi all, >>> >>> One of the most important obstacles in machine learning and deep >>> learning is getting data into a format that neural nets can understand. >>> Neural nets understand vectors. Therefore, vectorization is an important >>> part in building neural network algorithms. >>> >>> Canova is a Vectorization library for Machine Learning which is >>> associated with deeplearning4j library. It is designed to support all major >>> types of input data such as text,csv,image,audio,video and etc. >>> >>> In our project to add RNN for Machine Learner, we have to use a >>> vectorizing component to convert input data to vectors. I think that Canova >>> is a better to build a generic vectorizing component. I am researching on >>> using Canova for the vectorizing purpose. >>> >>> Any suggestions on this are highly appreciated. >>> >>> >>> Thanks >>> >>> >>> >>> On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana <[email protected]> >>> wrote: >>> >>>> Hi Srinath, >>>> >>>> We have decided to implement only classification first. Once we >>>> complete the classification, we hope to do next value prediction too. >>>> We are basically trying to implement a program to make sure that the >>>> deeplearning4j library we are using is compatible with apache spark >>>> pipeline. And also we are trying to demonstrate all the machine learning >>>> steps with that program. >>>> >>>> We are now using aclImdb sentiment analysis data set to verify the >>>> accuracy of the RNN model we create. >>>> >>>> Thanks >>>> Thamali >>>> >>>> >>>> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <[email protected]> >>>> wrote: >>>> >>>>> Hi Thamali, >>>>> >>>>> >>>>> 1. RNN can do both classification and predict next value. Are we >>>>> trying to do both? >>>>> 2. When Upul played with it, he had trouble getting deeplearning4j >>>>> implementation work with predict next value scenario. Is it fixed? >>>>> 3. What are the data sets we will use to verify the accuracy of >>>>> RNN after integration? >>>>> >>>>> >>>>> --Srinath >>>>> >>>>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Currently we are working on a project to add Recurrent Neural >>>>>> Network(RNN) algorithm to machine learner. RNN is one of deep learning >>>>>> algorithms with record breaking accuracy. For more information on RNN >>>>>> please refer link[1]. >>>>>> >>>>>> We have decided to use deeplearning4j which is an open source deep >>>>>> learning library scalable on spark and Hadoop. >>>>>> >>>>>> Since there is a plan to add spark pipeline to machine Learner, we >>>>>> have decided to use spark pipeline concept to our project. >>>>>> >>>>>> I have designed an architecture for the RNN implementation. >>>>>> >>>>>> This architecture is developed to be compatible with spark pipeline. >>>>>> >>>>>> Data set is taken in csv format and then it is converted to spark >>>>>> data frame since apache spark works mostly with data frames. >>>>>> >>>>>> Next step is a transformer which is needed to tokenize the sequential >>>>>> data. A tokenizer is basically used for take a sequence of data and break >>>>>> it into individual units. For example, it can be used to break the words >>>>>> in >>>>>> a sentence to words. >>>>>> >>>>>> Next step is again a transformer used to converts tokens to vectors. >>>>>> This must be done because the features should be added to spark pipeline >>>>>> in >>>>>> org.apache.spark.mllib.linlag.VectorUDT format. >>>>>> >>>>>> Next, the transformed data are fed to the data set iterator. This is >>>>>> an object of a class which implement >>>>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset >>>>>> iterator >>>>>> traverses through a data set and prepares data for neural networks. >>>>>> >>>>>> Next component is the RNN algorithm model which is an estimator. The >>>>>> iterated data from data set iterator is fed to RNN and a model is >>>>>> generated. Then this model can be used for predictions. >>>>>> >>>>>> We have decided to complete this project in two steps : >>>>>> >>>>>> >>>>>> - >>>>>> >>>>>> First create a spark pipeline program containing the steps in >>>>>> machine learner(uploading dataset, generate model, calculating >>>>>> accuracy and >>>>>> prediction) and check whether the project is feasible. >>>>>> - >>>>>> >>>>>> Next add the algorithm to ML >>>>>> >>>>>> Currently we have almost completed the first step and now we are >>>>>> collecting more data and tuning for hyper parameters. >>>>>> >>>>>> [1] >>>>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ============================ >>>>> Srinath Perera, Ph.D. >>>>> http://people.apache.org/~hemapani/ >>>>> http://srinathsview.blogspot.com/ >>>>> >>>> >>>> >>> >> > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://home.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
