Hi Thamali, It might be better if you can share the artifacts you used to execute these tests in a public location. May be including a README.md file with the steps to be followed.
Thanks On Thu, Apr 21, 2016 at 6:03 PM, Thamali Wijewardhana <[email protected]> wrote: > Hi, > > I have completed writing the article[1] containing the comparison between > the deeplearning4j library and Keras library considering Recurrent Neural > network(RNN) algorithm. > I also have found out the reasons for low performance of Deeplearning4j > library using Java Flight Recorder(JFR) and Flame Graphs and included in > the article. > > [1] > https://docs.google.com/a/wso2.com/document/d/1CGq1y5QBzW6EaHyf-UqAiatxLumb6lo_mRLjYZWD18o/edit?usp=sharing > > Thanks > > > On Fri, Apr 8, 2016 at 7:20 PM, Thamali Wijewardhana <[email protected]> > wrote: > >> Hi, >> >> I have used a dataset with 25000 rows and the size is 80 MB. >> >> The link to the dataset is: >> >> http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz >> >> >> >> >> On Fri, Apr 8, 2016 at 3:07 PM, Srinath Perera <[email protected]> wrote: >> >>> Thamali, how big is the data set you are using? ( give me a link to the >>> data set as well). >>> >>> Nirmal, shall we compare the accuracy of RNN vs. Upul's rolling window >>> method? >>> >>> --Srinath >>> >>> On Fri, Apr 8, 2016 at 9:23 AM, Thamali Wijewardhana <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I run the RNN algorithm using deeplearning4j library and the Keras >>>> python library. The dataset, hyper parameters, network architecture and the >>>> hardware platform are the same. Given below is the time comparison >>>> >>>> Deeplearning4j library-40 minutes per 1 epoch >>>> Keras library- 4 minutes per 1 epoch >>>> >>>> I also compared the accuracies[1]. The deeplearning4j library gives a >>>> low accuracy compared to Keras library. >>>> >>>> [1] >>>> https://docs.google.com/spreadsheets/d/1-EvC1P7N90k1S_Ly6xVcFlEEKprh7r41Yk8aI6DiSaw/edit#gid=1050346562 >>>> >>>> Thanks >>>> >>>> >>>> >>>> On Fri, Apr 1, 2016 at 10:12 AM, Thamali Wijewardhana <[email protected] >>>> > wrote: >>>> >>>>> Hi, >>>>> I have organized a review on Monday (4th of April). >>>>> >>>>> Thanks >>>>> >>>>> On Thu, Mar 31, 2016 at 3:21 PM, Srinath Perera <[email protected]> >>>>> wrote: >>>>> >>>>>> Please setup a review. Shall we do it monday? >>>>>> >>>>>> On Thu, Mar 31, 2016 at 2:15 PM, Thamali Wijewardhana < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> we have created a spark program to prove the feasibility of adding >>>>>>> the RNN algorithm to machine learner. >>>>>>> This program demonstrates all the steps in machine learner: >>>>>>> >>>>>>> Uploading a dataset >>>>>>> >>>>>>> Selecting the hyper parameters for the model >>>>>>> >>>>>>> Creating a RNN model using data and training the model >>>>>>> >>>>>>> Calculating the accuracy of the model >>>>>>> >>>>>>> Saving the model(As a serialization object) >>>>>>> >>>>>>> predicting using the model >>>>>>> >>>>>>> This program is based on deeplearning4j and apache spark pipeline. >>>>>>> Deeplearning4j was used as the deep learning library for recurrent >>>>>>> neural >>>>>>> network algorithm. As the program should be based on the Spark pipeline, >>>>>>> the main challenge was to use deeplearning4j library with spark >>>>>>> pipeline. >>>>>>> The components used in the spark pipeline should be compatible with >>>>>>> spark >>>>>>> pipeline. For other components which are not compatible with spark >>>>>>> pipeline, we have to wrap them with a org.apache.spark.predictionModel >>>>>>> object. >>>>>>> >>>>>>> We have designed a pipeline with sequence of stages (transformers >>>>>>> and estimators): >>>>>>> >>>>>>> 1. Tokenizer:Transformer-Split each sequential data to tokens.(For >>>>>>> example, in sentiment analysis, split text into words) >>>>>>> >>>>>>> 2. Vectorizer :Transformer-Transforms features into vectors. >>>>>>> >>>>>>> 3. RNN algorithm :Estimator -RNN algorithm which trains on a data >>>>>>> frame and produces a RNN model >>>>>>> >>>>>>> 4. RNN model : Transformer- Transforms data frame with features to >>>>>>> data frame with predictions. >>>>>>> >>>>>>> The diagrams below explains the stages of the pipeline. The first >>>>>>> diagram illustrates the training usage of the pipeline and the next >>>>>>> diagram >>>>>>> illustrates the testing and predicting usage of a pipeline. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I also have tuned the RNN model for hyper parameters[1] and found >>>>>>> the values of hyper parameters which optimizes accuracy of the model. >>>>>>> Give below is the set of hyper parameters relevant to RNN algorithm >>>>>>> and the tuned values. >>>>>>> >>>>>>> >>>>>>> Number of epochs-10 >>>>>>> >>>>>>> Number of iterations- 1 >>>>>>> >>>>>>> Learning rate-0.02 >>>>>>> >>>>>>> We used the aclImdb sentiment analysis data set for this program and >>>>>>> with the above hyper parameters, we could achieve 60% accuracy. And we >>>>>>> are >>>>>>> trying to improve the accuracy and efficiency of our algorithm. >>>>>>> >>>>>>> [1] >>>>>>> https://docs.google.com/spreadsheets/d/1Wcta6i2k4Je_5l16wCVlH6zBMNGIb-d7USaWdbrkrSw/edit?ts=56fcdc9b#gid=2118685173 >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 25, 2016 at 10:18 AM, Thamali Wijewardhana < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> One of the most important obstacles in machine learning and deep >>>>>>>> learning is getting data into a format that neural nets can understand. >>>>>>>> Neural nets understand vectors. Therefore, vectorization is an >>>>>>>> important >>>>>>>> part in building neural network algorithms. >>>>>>>> >>>>>>>> Canova is a Vectorization library for Machine Learning which is >>>>>>>> associated with deeplearning4j library. It is designed to support all >>>>>>>> major >>>>>>>> types of input data such as text,csv,image,audio,video and etc. >>>>>>>> >>>>>>>> In our project to add RNN for Machine Learner, we have to use a >>>>>>>> vectorizing component to convert input data to vectors. I think that >>>>>>>> Canova >>>>>>>> is a better to build a generic vectorizing component. I am researching >>>>>>>> on >>>>>>>> using Canova for the vectorizing purpose. >>>>>>>> >>>>>>>> Any suggestions on this are highly appreciated. >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Srinath, >>>>>>>>> >>>>>>>>> We have decided to implement only classification first. Once we >>>>>>>>> complete the classification, we hope to do next value prediction too. >>>>>>>>> We are basically trying to implement a program to make sure that >>>>>>>>> the deeplearning4j library we are using is compatible with apache >>>>>>>>> spark >>>>>>>>> pipeline. And also we are trying to demonstrate all the machine >>>>>>>>> learning >>>>>>>>> steps with that program. >>>>>>>>> >>>>>>>>> We are now using aclImdb sentiment analysis data set to verify the >>>>>>>>> accuracy of the RNN model we create. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Thamali >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Thamali, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. RNN can do both classification and predict next value. Are >>>>>>>>>> we trying to do both? >>>>>>>>>> 2. When Upul played with it, he had trouble getting >>>>>>>>>> deeplearning4j implementation work with predict next value >>>>>>>>>> scenario. Is it >>>>>>>>>> fixed? >>>>>>>>>> 3. What are the data sets we will use to verify the accuracy >>>>>>>>>> of RNN after integration? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --Srinath >>>>>>>>>> >>>>>>>>>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Currently we are working on a project to add Recurrent Neural >>>>>>>>>>> Network(RNN) algorithm to machine learner. RNN is one of deep >>>>>>>>>>> learning >>>>>>>>>>> algorithms with record breaking accuracy. For more information on >>>>>>>>>>> RNN >>>>>>>>>>> please refer link[1]. >>>>>>>>>>> >>>>>>>>>>> We have decided to use deeplearning4j which is an open source >>>>>>>>>>> deep learning library scalable on spark and Hadoop. >>>>>>>>>>> >>>>>>>>>>> Since there is a plan to add spark pipeline to machine Learner, >>>>>>>>>>> we have decided to use spark pipeline concept to our project. >>>>>>>>>>> >>>>>>>>>>> I have designed an architecture for the RNN implementation. >>>>>>>>>>> >>>>>>>>>>> This architecture is developed to be compatible with spark >>>>>>>>>>> pipeline. >>>>>>>>>>> >>>>>>>>>>> Data set is taken in csv format and then it is converted to >>>>>>>>>>> spark data frame since apache spark works mostly with data frames. >>>>>>>>>>> >>>>>>>>>>> Next step is a transformer which is needed to tokenize the >>>>>>>>>>> sequential data. A tokenizer is basically used for take a sequence >>>>>>>>>>> of data >>>>>>>>>>> and break it into individual units. For example, it can be used to >>>>>>>>>>> break >>>>>>>>>>> the words in a sentence to words. >>>>>>>>>>> >>>>>>>>>>> Next step is again a transformer used to converts tokens to >>>>>>>>>>> vectors. This must be done because the features should be added to >>>>>>>>>>> spark >>>>>>>>>>> pipeline in org.apache.spark.mllib.linlag.VectorUDT format. >>>>>>>>>>> >>>>>>>>>>> Next, the transformed data are fed to the data set iterator. >>>>>>>>>>> This is an object of a class which implement >>>>>>>>>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset >>>>>>>>>>> iterator >>>>>>>>>>> traverses through a data set and prepares data for neural networks. >>>>>>>>>>> >>>>>>>>>>> Next component is the RNN algorithm model which is an estimator. >>>>>>>>>>> The iterated data from data set iterator is fed to RNN and a model >>>>>>>>>>> is >>>>>>>>>>> generated. Then this model can be used for predictions. >>>>>>>>>>> >>>>>>>>>>> We have decided to complete this project in two steps : >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - >>>>>>>>>>> >>>>>>>>>>> First create a spark pipeline program containing the steps >>>>>>>>>>> in machine learner(uploading dataset, generate model, >>>>>>>>>>> calculating accuracy >>>>>>>>>>> and prediction) and check whether the project is feasible. >>>>>>>>>>> - >>>>>>>>>>> >>>>>>>>>>> Next add the algorithm to ML >>>>>>>>>>> >>>>>>>>>>> Currently we have almost completed the first step and now we are >>>>>>>>>>> collecting more data and tuning for hyper parameters. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ============================ >>>>>>>>>> Srinath Perera, Ph.D. >>>>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ============================ >>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>> Site: http://home.apache.org/~hemapani/ >>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>> Phone: 0772360902 >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> ============================ >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://home.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >> >> > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Imesh Gunaratne* Senior Technical Lead WSO2 Inc: http://wso2.com T: +94 11 214 5345 M: +94 77 374 2057 W: http://imesh.io Lean . Enterprise . Middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
