Re: [Dev] [GSOC2016]Proposal 4: [ML] Ensemble Methods Support for WSO2 Machine Learner

Supun Sethunga Tue, 07 Jun 2016 10:20:11 -0700

Hi,

I have restructured my code, see [1][2]. It is a naive implementation if my
> logic works.


Why there are two files/implementations? which is the correct one? Say, if
I want to try out your implementation, which one should I run?

When creating an instance of Stacking class, I need to pass an argument of
> type MLModelConfigurationContext. and getcontext() method doesn't work as I
> am in a static main method.
> Question: 1. What is the best way to get context argument?
>                 2.  Is the design of Stacking class fine?


Why would you need to use MLModelConfigurationContext. and getcontext()
methods? Those are utility methods used at the ML server runtime to
temporary store configurations. Let's not worry about those, as they are
part of the integration phase. First, try to implement the stacking with
native spark-mllib libraries, and re-use methods/components in ML server *ONLY
IF * they are necessary or they are re-usable (i.e: if some method is
already available, which you need).

Regards,
Supun

On Tue, Jun 7, 2016 at 8:31 PM, Misgana Negassi <negas...@tf.uni-freiburg.de
> wrote:

>
> Hi Supun,
>
> My dependencies problems are solved, thanks!
>
> I have restructured my code, see [1][2]. It is a naive implementation if
> my logic works.
>
>
> When creating an instance of Stacking class, I need to pass an argument of
> type MLModelConfigurationContext. and getcontext() method doesn't work as I
> am in a static main method.
> Question: 1. What is the best way to get context argument?
>                 2.  Is the design of Stacking class fine?
>
> I appreciate your feedback!
> Misgana
>
>
>
>
>
> [1]
> https://github.com/zemoel/ensemble-methods/blob/master/src/main/java/Stacking.java
> [2]
> https://github.com/zemoel/ensemble-methods/blob/master/src/main/java/ReadCSV.java#L188
>
>
> On 04.06.2016 06:18, Supun Sethunga wrote:
>
> Hi,
>
> Can you check whether you have defined the relevant repositories in the
> pom.xml? If haven't, please do so as in [1].
>
> If that didn't work out, can you try checking out the source code of [2],
> and build it locally, and then build your code?
>
> [1] https://github.com/wso2/carbon-ml/blob/master/pom.xml#L59
> [2]
> https://github.com/wso2/carbon-metrics/tree/v1.1.0/components/org.wso2.carbon.metrics.manager
>
> Regards,
>
> On Fri, Jun 3, 2016 at 7:06 PM, Misgana Negassi <
> negas...@tf.uni-freiburg.de> wrote:
>
>> Hi Supun,
>>
>> This  is [1] the predict() method invoked.
>>
>> I added the dependencies and run mvn clean install and am trying to debug
>> this error now:
>>
>>
>>
>> *Failed to execute goal on project ensemble-methods: Could not resolve
>> dependencies for project
>> org.wso2.carbon.ml:ensemble-methods:jar:1.0-SNAPSHOT: The following
>> artifacts could not be resolved:
>> org.wso2.carbon.metrics:org.wso2.carbon.metrics.manager:jar:1.1.0 *Best
>>
>>
>>
>>
>> [1]
>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80
>>
>> On 03.06.2016 14:11, Supun Sethunga wrote:
>>
>> Hi,
>>
>> Can you specify which predict() method, to be precise?
>>
>> Anyway, looking at the error trace, this seems to be a version mismatch
>> for the dependency "org.wso2.carbon.metrics ". Can you try adding v1.1.0?
>> i.e: <dependency> <groupId>org.wso2.carbon.metrics</groupId>
>> <artifactId>org.wso2.carbon.metrics.manager</artifactId> 
>> <version>1.1.0</version>
>> </dependency>
>> Regards,
>>
>> On Fri, Jun 3, 2016 at 5:21 PM, Misgana Negassi <
>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>
>>> Hi Supun,
>>> I get this error when calling predict() method.
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/wso2/carbon/metrics/manager/Level
>>>     at
>>> org.wso2.carbon.ml.core.impl.Predictor.getTimer(Predictor.java:301)
>>>     at org.wso2.carbon.ml.core.impl.Predictor.predict(Predictor.java:83)
>>>     at testing.main(testing.java:27)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:497)
>>>     at
>>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.wso2.carbon.metrics.manager.Level
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>
>>>
>>> I added the module dependency "org.wso2.carbon.metrics " but still the
>>> error persists. Appreciate your help!
>>>
>>> Best,
>>> Misgana
>>>
>>>
>>> On 02.06.2016 10:49, Supun Sethunga wrote:
>>>
>>> Hi,
>>>
>>> A sample on how to invoke the predict() method can be found in the above
>>> links ([1], [2], [3] in order) I've shared..
>>>
>>> Regards,
>>> Supun
>>>
>>> On Thu, Jun 2, 2016 at 2:13 PM, Misgana Negassi <
>>> negas...@tf.uni-freiburg.de> wrote:
>>>
>>>> Hi Supun,
>>>>
>>>> My current implementation is to invoke train/test method of the
>>>> Algorithms classes.
>>>>
>>>> It would be nice if you can give a small example how to invoke this
>>>> predict() method in getting predictions of basemodels?
>>>>
>>>> Thanks!
>>>>
>>>> Best,
>>>> Misgana
>>>>
>>>> On 02.06.2016 07:12, Supun Sethunga wrote:
>>>>
>>>> Hi,
>>>>
>>>> As of your suggestion, I will abandon this approach and implement using
>>>>> e.g Random Forest as meta-learner.
>>>>
>>>> Ack.
>>>>
>>>> The predictions of base-models are of JavaPairRDD<Double,Double> which
>>>>> is a tuple of predictions and
>>>>> corresponding labels. Am I right?
>>>>
>>>> Well, the predictions of base models are a List of class labels (if it
>>>> is a classification). i.e. List<?>.  Please check the predict() methods
>>>> [1], [2] and [3]. Then you'll get the better understanding of the flow of
>>>> predictions happens, and what data types are returned.
>>>> JavaPairRDD<Double,Double> are returned when we use the test() method
>>>> [3]. That returns the predicted class (encoded value of that class, which
>>>> is a double), and the actual class (again the encoded value, which is a
>>>> double).
>>>>
>>>> I guess we have to use the predict() method (ahead of test() method),
>>>> because we can use that same method for both testing (evaluating) and
>>>> predicting, when using the stacking method.
>>>>
>>>>
>>>> I have used many approaches to convert this to a matrix I can use for
>>>>> training meta-algorithm.
>>>>> Do you have a more efficient idea/hint to convert
>>>>> JavaPairRDD<Double,Double> to JavaRDD<LabeledPoint> before I go crazy with
>>>>> workarounds?
>>>>
>>>> Now that since we are going to use predict() method as above, what we
>>>> get from each base-algorithm is a List<?>. And we need to combine multiple
>>>> List<?> s (one list from each base learner), and form a 
>>>> JavaRDD<LabeledPoint>.
>>>> I haven't tried any similar method but I can think of two approaches:
>>>>
>>>>    - First combine the multiple Lists to create a single List, and
>>>>    then convert it to a JavaRDD<> [5].
>>>>    - *OR* convert each List to a JavaRDD first and then combine those
>>>>    JavaRDDs to form the single JavaRDD<LabeledPoint>
>>>>
>>>> [1]
>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.rest.api/src/main/java/org/wso2/carbon/ml/rest/api/ModelApiV20.java#L241>
>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.rest.api/src/main/java/org/wso2/carbon/ml/rest/api/ModelApiV20.java#L241
>>>> [2]
>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/MLModelHandler.java#L632>
>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/MLModelHandler.java#L632
>>>> [3]
>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80
>>>> [4]
>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/RandomForestClassifier.java#L61
>>>> [5]
>>>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#parallelize%28java.util.List%29>
>>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#parallelize(java.util.List)
>>>>
>>>>
>>>> Regards,
>>>> Supun
>>>>
>>>> On Wed, Jun 1, 2016 at 7:15 PM, Misgana Negassi <
>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>
>>>>>
>>>>> Hi Supun,
>>>>>
>>>>> I was implementing the paper[1], that's why I wrote a method to
>>>>> compute weights using quadratic programming solver (btw I checked if the
>>>>> third-party libraries are open source).
>>>>>
>>>>> As of your suggestion, I will abandon this approach and implement
>>>>> using e.g Random Forest as meta-learner.
>>>>>
>>>>> I also have a question:
>>>>>
>>>>> The predictions of base-models are of JavaPairRDD<Double,Double> which
>>>>> is a tuple of predictions and
>>>>> corresponding labels. Am I right?
>>>>>
>>>>> I have used many approaches to convert this to a matrix I can use for
>>>>> training meta-algorithm.
>>>>> Do you have a more efficient idea/hint to convert
>>>>> JavaPairRDD<Double,Double> to JavaRDD<LabeledPoint> before I go crazy with
>>>>> workarounds?
>>>>>
>>>>> Best,
>>>>> Misgana
>>>>>
>>>>>
>>>>>
>>>>> [1] <https://arxiv.org/pdf/1105.5466.pdf>
>>>>> https://arxiv.org/pdf/1105.5466.pdf
>>>>>
>>>>> On 01.06.2016 14:41, Supun Sethunga wrote:
>>>>>
>>>>> Hi Misgana,
>>>>>
>>>>> I went through your current implementation for stacking. Good to see
>>>>> you have made progress!
>>>>>
>>>>> However, in your implementation, I noticed that you are calculating
>>>>> weights to combine the base-algorithms. IMO, we don't need to calculate
>>>>> weights for stacking. We can simply combine the results of base-learners,
>>>>> using another well-known algorithm (aka meta-algorithm/meat-learner), such
>>>>> as Random Forest, Decision Tree, etc..
>>>>> For Bagging, you may calculate the weights for combining the
>>>>> base-models.
>>>>>
>>>>> Also, please note that, when using external libraries/ third-party
>>>>> dependencies, they have to have open-source licence. (eg: Apache 2.0
>>>>> licence, MIT licence)
>>>>>
>>>>> Regards,
>>>>> Supun
>>>>>
>>>>> On Wed, Jun 1, 2016 at 9:29 AM, Supun Sethunga < <sup...@wso2.com>
>>>>> sup...@wso2.com> wrote:
>>>>>
>>>>>> Hi Misgana,
>>>>>>
>>>>>> Any update on the progress?
>>>>>>
>>>>>> Regards,
>>>>>> Supun
>>>>>>
>>>>>> On Fri, May 27, 2016 at 4:49 PM, Supun Sethunga < <sup...@wso2.com>
>>>>>> sup...@wso2.com> wrote:
>>>>>>
>>>>>>> Hi Misgana,
>>>>>>>
>>>>>>> carbon-ml uses spark 1.4.1 [1]. And yes, as you have mentioned, the
>>>>>>> probabilistic classification algorithms reurns the predicted class (and 
>>>>>>> its
>>>>>>> encoded numeric value), but not the probability.
>>>>>>>
>>>>>>> Unfortunately, we cant bump the spark version to 1.6.1 at this
>>>>>>> point, as the new spark alogrithms uses Dataframes (where as 1.4.1 uses
>>>>>>> RDDs), and hence require a huge refactoring the entire carbon-ml code. 
>>>>>>> This
>>>>>>> will not be a feasible solution, given the project timelines.
>>>>>>>
>>>>>>> Therefore, we might have no other option, but to go with
>>>>>>> predictions (class label / encoded-value) we have. Lets not worry
>>>>>>> about the accuracy for now (unless theres a huge difference). We can 
>>>>>>> have
>>>>>>> it as a future improvement.
>>>>>>>
>>>>>>> [1]  <https://github.com/wso2/carbon-ml/blob/master/pom.xml#L654>
>>>>>>> https://github.com/wso2/carbon-ml/blob/master/pom.xml#L654
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 27, 2016 at 1:42 PM, Misgana Negassi <
>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>> My implementation of Stacking expects prediction of class
>>>>>>>> probabilities. But the predictions, I get from building the algorithms 
>>>>>>>> is
>>>>>>>> not probabilistic, although the classification algorithms extend 
>>>>>>>> this[1]
>>>>>>>> class(as of Spark1.6.1).
>>>>>>>> My Question:
>>>>>>>>
>>>>>>>> 1.Does carbon ml use the latest Spark java api package? If so, I
>>>>>>>> can't invoke predictProbability method on it.
>>>>>>>>
>>>>>>>> 2.If it doesn't support probabilistic predictions,
>>>>>>>> My solution:
>>>>>>>>  >>Create a naive-method which converts predictions to be
>>>>>>>> probabilistic and later maybe we can think of doing something smart 
>>>>>>>> with it.
>>>>>>>>
>>>>>>>> 3. If not, we work with predictions we have. Though, the paper I am
>>>>>>>> implementing clearly says ensembling confidences of predictions yield 
>>>>>>>> to
>>>>>>>> better predictions for classification tasks.
>>>>>>>>
>>>>>>>> Grateful for your ideas!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Misgana
>>>>>>>>
>>>>>>>> [1] <https://spark.apache.org/docs/1.6.1/api/java/index.html>
>>>>>>>> https://spark.apache.org/docs/1.6.1/api/java/index.html
>>>>>>>>
>>>>>>>> On 23.05.2016 15:21, Misgana Negassi wrote:
>>>>>>>>
>>>>>>>> Awesome!
>>>>>>>>
>>>>>>>> Yes, I decided to implement Stacking following ideas presented in
>>>>>>>> this paper[1],  which is to implement a meta-algorithm which combines
>>>>>>>> predictions of class probabilities using least-squares linear 
>>>>>>>> regression
>>>>>>>> with non-negativity constraint adapted for classification tasks.
>>>>>>>>
>>>>>>>> I will basically follow the coding style of "Algorithms"
>>>>>>>> implementations with train and test methods. I will commit my updates 
>>>>>>>> and
>>>>>>>> get back to you on questions.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Misgana
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] <https://arxiv.org/pdf/1105.5466.pdf>
>>>>>>>> https://arxiv.org/pdf/1105.5466.pdf
>>>>>>>>
>>>>>>>> On 23.05.2016 08:30, Supun Sethunga wrote:
>>>>>>>>
>>>>>>>> Hi Misgana,
>>>>>>>>
>>>>>>>> Believe you are progressing well, and have a better understanding
>>>>>>>> of the ML code base and what is needed to be done, by now.
>>>>>>>>
>>>>>>>> As you may already know, coding starts from today (23rd) onwards
>>>>>>>> according to the gsoc timeline, and have about one month (till 20th 
>>>>>>>> June)
>>>>>>>> for the midterm evaluation. Let's plan to get one ensemble method 
>>>>>>>> working
>>>>>>>> from end to end, including the UI, by the midterm evaluation. So, have 
>>>>>>>> you
>>>>>>>> decided on which method to implement first? (stacking?)
>>>>>>>>
>>>>>>>> Lets's gear up, and start coding :)
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>> On Fri, May 20, 2016 at 9:38 AM, Supun Sethunga < <sup...@wso2.com>
>>>>>>>> sup...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Misgana,
>>>>>>>>>
>>>>>>>>> Sorry for the late response. Please find my answers in-line.
>>>>>>>>>
>>>>>>>>> The easy problem: Is there a method like spark.mllib.util.kFold
>>>>>>>>>> for JavaRDD? Or should I implement it myself. I could not find 
>>>>>>>>>> anything in
>>>>>>>>>> your utils.
>>>>>>>>>
>>>>>>>>> Isn't it already accepts JavaRDDs? please refer [1].
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The second problem: Given the folds of the training set, how do I
>>>>>>>>>> get the predictions from the base models? As far as I could tell the
>>>>>>>>>> ModelBuilder does not give me those. I see two possible solutions:
>>>>>>>>>> 1) Create a new interface MachineLearningAlgorithm which provides
>>>>>>>>>> train and predict methods and let each method implement this 
>>>>>>>>>> interface.
>>>>>>>>>> 2) Copy the huge case statement from the ModelBuilder.
>>>>>>>>>> Do you have any preferences? Or more ideas?
>>>>>>>>>
>>>>>>>>> Im not very clear how option (1) solve the problem.. Can you
>>>>>>>>> explain it a bit? From my understanding, even if we create a new 
>>>>>>>>> interface,
>>>>>>>>> we will have to selectively create different algorithms for each fold,
>>>>>>>>> based on the type of algorithm user picks, isn't it?
>>>>>>>>>
>>>>>>>>> Or else, you can refactor the code and put that case statement to
>>>>>>>>> a utility method, and call that method wherever you need.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/util/MLUtils.html#kFold%28org.apache.spark.rdd.RDD,%20int,%20int,%20scala.reflect.ClassTag%29>
>>>>>>>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/util/MLUtils.html#kFold(org.apache.spark.rdd.RDD,%20int,%20int,%20scala.reflect.ClassTag)
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>> On Wed, May 18, 2016 at 8:41 PM, Misgana Negassi <
>>>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Supun,
>>>>>>>>>>
>>>>>>>>>> for my Stacking implementation I need predictions from other
>>>>>>>>>> models on different folds of the training set. I have two problems.
>>>>>>>>>>
>>>>>>>>>> The easy problem: Is there a method like spark.mllib.util.kFold
>>>>>>>>>> for JavaRDD? Or should I implement it myself. I could not find 
>>>>>>>>>> anything in
>>>>>>>>>> your utils.
>>>>>>>>>>
>>>>>>>>>> The second problem: Given the folds of the training set, how do I
>>>>>>>>>> get the predictions from the base models? As far as I could tell the
>>>>>>>>>> ModelBuilder does not give me those. I see two possible solutions:
>>>>>>>>>>
>>>>>>>>>> 1) Create a new interface MachineLearningAlgorithm which provides
>>>>>>>>>> train and predict methods and let each method implement this 
>>>>>>>>>> interface.
>>>>>>>>>>
>>>>>>>>>> 2) Copy the huge case statement from the ModelBuilder.
>>>>>>>>>>
>>>>>>>>>> Do you have any preferences? Or more ideas?
>>>>>>>>>>
>>>>>>>>>> About the Eclipse error:
>>>>>>>>>> I inserted an ignore tag in the pom.xml. After mvn clean, it
>>>>>>>>>> doesn't have compilation error anymore.
>>>>>>>>>> But I decided to stay working with IntellijIDEA unless you advice
>>>>>>>>>> against it.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Misgana
>>>>>>>>>>
>>>>>>>>>> On 16.05.2016 14:37, Supun Sethunga wrote:
>>>>>>>>>>
>>>>>>>>>> Can you try doing a "mvn clean"?
>>>>>>>>>>
>>>>>>>>>> Do you get any compilation failures?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Supun
>>>>>>>>>>
>>>>>>>>>> On Thu, May 12, 2016 at 7:48 PM, Misgana Negassi <
>>>>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the supportive hangout session.
>>>>>>>>>>>
>>>>>>>>>>> I had one question I forgot to ask.
>>>>>>>>>>>
>>>>>>>>>>> When I was importing the carbon-ml as maven project to Eclipse I
>>>>>>>>>>> had this error message.
>>>>>>>>>>> *    Multiple annotations found at this line:*
>>>>>>>>>>> *    - Plugin execution not covered by lifecycle configuration:
>>>>>>>>>>> org.apache.felix:maven-scr-plugin:1.7.2:scr (execution:
>>>>>>>>>>> generate-scr-scrdescriptor, phase: process-*
>>>>>>>>>>> *     classes)*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *    - maven-remote-resources-plugin (goal "process") is ignored
>>>>>>>>>>> by m2e. *How did you solve this problem if you have enountered
>>>>>>>>>>> it?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Misgana
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 09.05.2016 06:12, Supun Sethunga wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Misgana,
>>>>>>>>>>>
>>>>>>>>>>> I committed the code for reading a csv file. My next task will
>>>>>>>>>>>> be sampling and starting to implement an ensemble method(Stacking).
>>>>>>>>>>>
>>>>>>>>>>> I went through the code. Would like to suggest a small thing.
>>>>>>>>>>> Most of the Spark algorithms need JavaRDDs as the input for 
>>>>>>>>>>> datasets. Hence
>>>>>>>>>>> reading your file as a JavaRDD<LabeledPoint> is the better approach 
>>>>>>>>>>> than
>>>>>>>>>>> reading it as a list of labelled points. Please refer [1] and [2] 
>>>>>>>>>>> for an
>>>>>>>>>>> example.
>>>>>>>>>>>
>>>>>>>>>>> -  How to decide which models to use for an ensemble and which
>>>>>>>>>>>> parameters?
>>>>>>>>>>>
>>>>>>>>>>> Type of Model/Algorithm has to be a user input. The parameters
>>>>>>>>>>> will depend on the algorithm user picks.
>>>>>>>>>>>
>>>>>>>>>>> - Should the ensemble methods be implemented as a wrapper around
>>>>>>>>>>>> the base-models?
>>>>>>>>>>>
>>>>>>>>>>> Yes.  You can use the existing algorithms in WSO2 Machine
>>>>>>>>>>> Learner, as the base-models. (I have shared that in my previous 
>>>>>>>>>>> mail)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> - Which library to use for matrix operations? Is Apache
>>>>>>>>>>>> commons.math.Linearalgebra ok?
>>>>>>>>>>>
>>>>>>>>>>> Yes  Apache commons.math.* would be fine. Infact you can use
>>>>>>>>>>> any library with open-source licence.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What do you think about a hangout session to clarify stuff and
>>>>>>>>>>>> get to know each other.:)
>>>>>>>>>>>
>>>>>>>>>>> Of course! Please arrange some time slot (Hope it will be IST
>>>>>>>>>>> time zone: GMT+5.30 friendly :) ) and send me a calendar invite.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/utils/MLUtils.java#L58>
>>>>>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/utils/MLUtils.java#L58
>>>>>>>>>>> [2]
>>>>>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L87>
>>>>>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L87
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Supun
>>>>>>>>>>>
>>>>>>>>>>> On Sat, May 7, 2016 at 8:46 PM, Misgana Negassi <
>>>>>>>>>>> <negas...@tf.uni-freiburg.de>negas...@tf.uni-freiburg.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>> I committed the code for reading a csv file. My next task will
>>>>>>>>>>>> be sampling and starting to implement an ensemble method(Stacking).
>>>>>>>>>>>> I have some questions about:
>>>>>>>>>>>> -  How to decide which models to use for an ensemble and which
>>>>>>>>>>>> parameters?
>>>>>>>>>>>> - Should the ensemble methods be implemented as a wrapper
>>>>>>>>>>>> around the
>>>>>>>>>>>>
>>>>>>>>>>> ...
>
> [Message clipped]




-- 
*Supun Sethunga*
Senior Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [GSOC2016]Proposal 4: [ML] Ensemble Methods Support for WSO2 Machine Learner

Reply via email to