Thamali, how big is the data set you are using? ( give me a link to the data set as well).
Nirmal, shall we compare the accuracy of RNN vs. Upul's rolling window method? --Srinath On Fri, Apr 8, 2016 at 9:23 AM, Thamali Wijewardhana <[email protected]> wrote: > Hi, > > I run the RNN algorithm using deeplearning4j library and the Keras python > library. The dataset, hyper parameters, network architecture and the > hardware platform are the same. Given below is the time comparison > > Deeplearning4j library-40 minutes per 1 epoch > Keras library- 4 minutes per 1 epoch > > I also compared the accuracies[1]. The deeplearning4j library gives a low > accuracy compared to Keras library. > > [1] > https://docs.google.com/spreadsheets/d/1-EvC1P7N90k1S_Ly6xVcFlEEKprh7r41Yk8aI6DiSaw/edit#gid=1050346562 > > Thanks > > > > On Fri, Apr 1, 2016 at 10:12 AM, Thamali Wijewardhana <[email protected]> > wrote: > >> Hi, >> I have organized a review on Monday (4th of April). >> >> Thanks >> >> On Thu, Mar 31, 2016 at 3:21 PM, Srinath Perera <[email protected]> wrote: >> >>> Please setup a review. Shall we do it monday? >>> >>> On Thu, Mar 31, 2016 at 2:15 PM, Thamali Wijewardhana <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> we have created a spark program to prove the feasibility of adding the >>>> RNN algorithm to machine learner. >>>> This program demonstrates all the steps in machine learner: >>>> >>>> Uploading a dataset >>>> >>>> Selecting the hyper parameters for the model >>>> >>>> Creating a RNN model using data and training the model >>>> >>>> Calculating the accuracy of the model >>>> >>>> Saving the model(As a serialization object) >>>> >>>> predicting using the model >>>> >>>> This program is based on deeplearning4j and apache spark pipeline. >>>> Deeplearning4j was used as the deep learning library for recurrent neural >>>> network algorithm. As the program should be based on the Spark pipeline, >>>> the main challenge was to use deeplearning4j library with spark pipeline. >>>> The components used in the spark pipeline should be compatible with spark >>>> pipeline. For other components which are not compatible with spark >>>> pipeline, we have to wrap them with a org.apache.spark.predictionModel >>>> object. >>>> >>>> We have designed a pipeline with sequence of stages (transformers and >>>> estimators): >>>> >>>> 1. Tokenizer:Transformer-Split each sequential data to tokens.(For >>>> example, in sentiment analysis, split text into words) >>>> >>>> 2. Vectorizer :Transformer-Transforms features into vectors. >>>> >>>> 3. RNN algorithm :Estimator -RNN algorithm which trains on a data frame >>>> and produces a RNN model >>>> >>>> 4. RNN model : Transformer- Transforms data frame with features to data >>>> frame with predictions. >>>> >>>> The diagrams below explains the stages of the pipeline. The first >>>> diagram illustrates the training usage of the pipeline and the next diagram >>>> illustrates the testing and predicting usage of a pipeline. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I also have tuned the RNN model for hyper parameters[1] and found the >>>> values of hyper parameters which optimizes accuracy of the model. >>>> Give below is the set of hyper parameters relevant to RNN algorithm and >>>> the tuned values. >>>> >>>> >>>> Number of epochs-10 >>>> >>>> Number of iterations- 1 >>>> >>>> Learning rate-0.02 >>>> >>>> We used the aclImdb sentiment analysis data set for this program and >>>> with the above hyper parameters, we could achieve 60% accuracy. And we are >>>> trying to improve the accuracy and efficiency of our algorithm. >>>> >>>> [1] >>>> https://docs.google.com/spreadsheets/d/1Wcta6i2k4Je_5l16wCVlH6zBMNGIb-d7USaWdbrkrSw/edit?ts=56fcdc9b#gid=2118685173 >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> On Fri, Mar 25, 2016 at 10:18 AM, Thamali Wijewardhana < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> One of the most important obstacles in machine learning and deep >>>>> learning is getting data into a format that neural nets can understand. >>>>> Neural nets understand vectors. Therefore, vectorization is an important >>>>> part in building neural network algorithms. >>>>> >>>>> Canova is a Vectorization library for Machine Learning which is >>>>> associated with deeplearning4j library. It is designed to support all >>>>> major >>>>> types of input data such as text,csv,image,audio,video and etc. >>>>> >>>>> In our project to add RNN for Machine Learner, we have to use a >>>>> vectorizing component to convert input data to vectors. I think that >>>>> Canova >>>>> is a better to build a generic vectorizing component. I am researching on >>>>> using Canova for the vectorizing purpose. >>>>> >>>>> Any suggestions on this are highly appreciated. >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Srinath, >>>>>> >>>>>> We have decided to implement only classification first. Once we >>>>>> complete the classification, we hope to do next value prediction too. >>>>>> We are basically trying to implement a program to make sure that the >>>>>> deeplearning4j library we are using is compatible with apache spark >>>>>> pipeline. And also we are trying to demonstrate all the machine learning >>>>>> steps with that program. >>>>>> >>>>>> We are now using aclImdb sentiment analysis data set to verify the >>>>>> accuracy of the RNN model we create. >>>>>> >>>>>> Thanks >>>>>> Thamali >>>>>> >>>>>> >>>>>> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Thamali, >>>>>>> >>>>>>> >>>>>>> 1. RNN can do both classification and predict next value. Are we >>>>>>> trying to do both? >>>>>>> 2. When Upul played with it, he had trouble getting >>>>>>> deeplearning4j implementation work with predict next value scenario. >>>>>>> Is it >>>>>>> fixed? >>>>>>> 3. What are the data sets we will use to verify the accuracy of >>>>>>> RNN after integration? >>>>>>> >>>>>>> >>>>>>> --Srinath >>>>>>> >>>>>>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Currently we are working on a project to add Recurrent Neural >>>>>>>> Network(RNN) algorithm to machine learner. RNN is one of deep learning >>>>>>>> algorithms with record breaking accuracy. For more information on RNN >>>>>>>> please refer link[1]. >>>>>>>> >>>>>>>> We have decided to use deeplearning4j which is an open source deep >>>>>>>> learning library scalable on spark and Hadoop. >>>>>>>> >>>>>>>> Since there is a plan to add spark pipeline to machine Learner, we >>>>>>>> have decided to use spark pipeline concept to our project. >>>>>>>> >>>>>>>> I have designed an architecture for the RNN implementation. >>>>>>>> >>>>>>>> This architecture is developed to be compatible with spark pipeline. >>>>>>>> >>>>>>>> Data set is taken in csv format and then it is converted to spark >>>>>>>> data frame since apache spark works mostly with data frames. >>>>>>>> >>>>>>>> Next step is a transformer which is needed to tokenize the >>>>>>>> sequential data. A tokenizer is basically used for take a sequence of >>>>>>>> data >>>>>>>> and break it into individual units. For example, it can be used to >>>>>>>> break >>>>>>>> the words in a sentence to words. >>>>>>>> >>>>>>>> Next step is again a transformer used to converts tokens to >>>>>>>> vectors. This must be done because the features should be added to >>>>>>>> spark >>>>>>>> pipeline in org.apache.spark.mllib.linlag.VectorUDT format. >>>>>>>> >>>>>>>> Next, the transformed data are fed to the data set iterator. This >>>>>>>> is an object of a class which implement >>>>>>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset >>>>>>>> iterator >>>>>>>> traverses through a data set and prepares data for neural networks. >>>>>>>> >>>>>>>> Next component is the RNN algorithm model which is an estimator. >>>>>>>> The iterated data from data set iterator is fed to RNN and a model is >>>>>>>> generated. Then this model can be used for predictions. >>>>>>>> >>>>>>>> We have decided to complete this project in two steps : >>>>>>>> >>>>>>>> >>>>>>>> - >>>>>>>> >>>>>>>> First create a spark pipeline program containing the steps in >>>>>>>> machine learner(uploading dataset, generate model, calculating >>>>>>>> accuracy and >>>>>>>> prediction) and check whether the project is feasible. >>>>>>>> - >>>>>>>> >>>>>>>> Next add the algorithm to ML >>>>>>>> >>>>>>>> Currently we have almost completed the first step and now we are >>>>>>>> collecting more data and tuning for hyper parameters. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ============================ >>>>>>> Srinath Perera, Ph.D. >>>>>>> http://people.apache.org/~hemapani/ >>>>>>> http://srinathsview.blogspot.com/ >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> ============================ >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://home.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >> >> > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://home.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
