Hi, Currently we are working on a project to add Recurrent Neural Network(RNN) algorithm to machine learner. RNN is one of deep learning algorithms with record breaking accuracy. For more information on RNN please refer link[1].
We have decided to use deeplearning4j which is an open source deep learning library scalable on spark and Hadoop. Since there is a plan to add spark pipeline to machine Learner, we have decided to use spark pipeline concept to our project. I have designed an architecture for the RNN implementation. This architecture is developed to be compatible with spark pipeline. Data set is taken in csv format and then it is converted to spark data frame since apache spark works mostly with data frames. Next step is a transformer which is needed to tokenize the sequential data. A tokenizer is basically used for take a sequence of data and break it into individual units. For example, it can be used to break the words in a sentence to words. Next step is again a transformer used to converts tokens to vectors. This must be done because the features should be added to spark pipeline in org.apache.spark.mllib.linlag.VectorUDT format. Next, the transformed data are fed to the data set iterator. This is an object of a class which implement org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset iterator traverses through a data set and prepares data for neural networks. Next component is the RNN algorithm model which is an estimator. The iterated data from data set iterator is fed to RNN and a model is generated. Then this model can be used for predictions. We have decided to complete this project in two steps : - First create a spark pipeline program containing the steps in machine learner(uploading dataset, generate model, calculating accuracy and prediction) and check whether the project is feasible. - Next add the algorithm to ML Currently we have almost completed the first step and now we are collecting more data and tuning for hyper parameters. [1] https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
