Hi all,

One of the most important obstacles in machine learning and deep learning
is getting data into a format that neural nets can understand. Neural nets
understand vectors. Therefore, vectorization is an important part in
building neural network algorithms.

Canova is a Vectorization library for Machine Learning which is associated
with deeplearning4j library. It is designed to support all major types of
input data such as text,csv,image,audio,video and etc.

In our project to add RNN for Machine Learner, we have to use a vectorizing
component to convert input data to vectors. I think that Canova is a better
to build a generic vectorizing component. I am researching on using Canova
for the vectorizing purpose.

Any suggestions on this are highly appreciated.


Thanks



On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana <[email protected]>
wrote:

> Hi Srinath,
>
> We have decided to  implement only classification first. Once we complete
> the classification, we hope to do next value prediction too.
> We are basically trying to implement a program to make sure that the
> deeplearning4j library we are using is compatible with apache spark
> pipeline. And also we are trying to demonstrate all the machine learning
> steps with that program.
>
> We are now using aclImdb sentiment analysis data set to verify the
> accuracy of the RNN model we create.
>
> Thanks
> Thamali
>
>
> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <[email protected]> wrote:
>
>> Hi Thamali,
>>
>>
>>    1. RNN can do both classification and predict next value. Are we
>>    trying to do both?
>>    2. When Upul played with it, he had trouble getting deeplearning4j
>>    implementation work with predict next value scenario. Is it fixed?
>>    3. What are the data sets we will use to verify the accuracy of RNN
>>    after integration?
>>
>>
>> --Srinath
>>
>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> Currently we are working on a project to add Recurrent Neural
>>> Network(RNN) algorithm to machine learner. RNN is one of deep learning
>>> algorithms with record breaking accuracy. For more information on RNN
>>> please refer link[1].
>>>
>>> We have decided to use deeplearning4j which is an open source deep
>>> learning library scalable on spark and Hadoop.
>>>
>>> Since there is a plan to add spark pipeline to machine Learner, we have
>>> decided to use spark pipeline concept to our project.
>>>
>>> I have designed an architecture for the RNN implementation.
>>>
>>> This architecture is developed to be compatible with spark pipeline.
>>>
>>> Data set is taken in csv format and then it is converted to spark data
>>> frame since apache spark works mostly with data frames.
>>>
>>> Next step is a transformer which is needed to tokenize the sequential
>>> data. A tokenizer is basically used for take a sequence of data and break
>>> it into individual units. For example, it can be used to break the words in
>>> a sentence to words.
>>>
>>> Next step is again a transformer used to converts tokens to vectors.
>>> This must be done because the features should be added to spark pipeline in
>>> org.apache.spark.mllib.linlag.VectorUDT format.
>>>
>>> Next, the transformed data are fed to the data set iterator. This is an
>>> object of a class which implement
>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset iterator
>>> traverses through a data set and prepares data for neural networks.
>>>
>>> Next component is the RNN algorithm model which is an estimator. The
>>> iterated data from data set iterator is fed to RNN and a model is
>>> generated. Then this model can be used for predictions.
>>>
>>> We have decided to complete this project in two steps :
>>>
>>>
>>>    -
>>>
>>>    First create a spark pipeline program containing the steps in
>>>    machine learner(uploading dataset, generate model, calculating accuracy 
>>> and
>>>    prediction) and check whether the project is feasible.
>>>    -
>>>
>>>    Next add the algorithm to ML
>>>
>>> Currently we have almost completed the first step and now we are
>>> collecting more data and tuning for hyper parameters.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit
>>>
>>>
>>>
>>> ​
>>>
>>
>>
>>
>> --
>> ============================
>> Srinath Perera, Ph.D.
>>    http://people.apache.org/~hemapani/
>>    http://srinathsview.blogspot.com/
>>
>
>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to