Re: [Dev] [GSOC2016]Proposal 4: [ML] Ensemble Methods Support for WSO2 Machine Learner

Supun Sethunga Mon, 13 Jun 2016 21:52:05 -0700

Hi Misgana,

That's a pretty good job!


I have integrated a working version of Stacking into the
> SupervisedSparkModelBuilder. My implementation is naive (potential
> unnecessary conversions) but I would keep it to test against a more
> efficient implementation. Please find the details below.

I also wanted to adapt the graphical user interface but I have no idea
> where to start. Can you give me a hint?

Let's not worry about the performance/efficiency for the moment. We can do
it as an improvement later on, if time permits.
Anyway, were you able to run the standalone one (the one you wrote as a
java client), for a Test dataset?

If it works without any issues, then before we move on to the UI
implementation, shall we test the integration using the REST API? (even the
UI call this rest API. So we should be able to do the same operation using
this REST API, without the UI. If that works fine too, then we will move to
the UI part?)

[1] is a built-in sample, which builds a Random Forest Classification model
from end to end, using the REST APIs. Can you use the same APIs in the same
order, and try to build a Stacking Model and see if it works fine? Please
note, when you are invoking the REST API, Need to make sure you pass the
correct values, especially for:

   - setting model configs - Need to pass, not only the Algorithm Name
   ("Stacking"), but also the Base/Meta algorithm names as well.
   - setting hyper params - Need to add Hyper-parameters for all the base
   and meta algorithms. Hence, the Algorithm name should be "Meta_algorithm_x"
   or something similar (but *not* "Stacking" *nor* the actual algorithm
   name used for base algorithm x, such as "Random Forest" )

Also, to achieve above, you might have to modify the DB schema [2], and in
the Table "*Hyper_Parameters*" make the *algorithm_name* a primary key as
well. Otherwise, there can be duplicates.
If you have any doubts please feel free to ask.

Also, Since the Mid-evaluation is near, Can you please arrange a session
(Hangouts would be fine) to do small demo on the current work you did, and
run a small sample with stacking (with the standalone java client you
implemented) to show how it works? It would be great if you can set it in
an IST (GMT + 0530 ) friendly time slot.

[1]
https://github.com/wso2/product-ml/blob/master/modules/samples/tuned/random-forest-classification/model-generation.sh
[2]
https://docs.wso2.com/display/ML110/Architecture#Architecture-Databasedesign

Thanks,
Supun

On Mon, Jun 13, 2016 at 10:02 PM, Misgana Negassi <
[email protected]> wrote:

> Hi Supun,
>
> I have integrated a working version of Stacking into the
> SupervisedSparkModelBuilder. My implementation is naive (potential
> unnecessary conversions) but I would keep it to test against a more
> efficient implementation. Please find the details below.
>
> I also wanted to adapt the graphical user interface but I have no idea
> where to start. Can you give me a hint?
>
> Best,
> Misgana
>
> DETAILS:
>
> UI LOGIC(My approach):
>  I expect a list of base-algorithms together with their parameters from
> the UI. That means, if the user selects Stacking, he will presented with UI
> to select the number of base-algorithms to train. After choosing a
> base-algorithm, he will be prompted to select parameters for each algorithm
> and is then redirected to selecting a meta-algorithm and also set its
> parameters.  This will be serialized and fed to carbon-ml.Finally,train.
>
> INTEGRATION:
>
> Here is the current work status:
>
>
> 1. buildStackingModel method for SupervisedSparkModelBuilder [1]:
>
>     STATUS :  Method implementation completed.
>
>     LOGIC:     We expect, a list of base-Algorithms together with their
> parameters(serialized). Deserialized, this will be fed to Stacking Class to
> train. To this end, I have  added in MLCONSTANTS class, the hyperparameters
> needed.
>                     Next Step, is to invoke Stacking test() method and
> build ModelSummary.
>
> 2. Stacking Class: [2]
>
>     STATUS:  Naive implementation with working code completed.
>
>     LOGIC: This class implements ClassificationModel.
>
>              For train() method, the idea is to implement the four main
> Steps in Stacking Logic.
>                 Step1. Train list of basemodels(level0) on cross-validated
> data. For this, I implemented a BaseModelBuilder Class which is similiar in
> logic to SupervisedSparkModelBuilder except, it returns a model of MLModel
> datatype.
>                 Step2. get predictions of each List<?> (we use here
> predict() method of Predictor class) and combine predictions to get level1
> dataset
>                 Step3. train Meta-Algorithm on level1dataset
>                 Step4. train base-algorithms on whole dataset and store
> list base models.
>
>            For test() method, we create a level1_test_dataset by combining
> predictions of basemodels (trained on the whole dataset) on the
> level0_test_dataset. Finally, we get final predictions invoking Predictor
> predict() method on            a meta-algorithm using level1_test_dataset.
> The output will be of form JavaPairRDD<Double, Double> PredictionAndLabels.
>
>
> 3. Refactored Methods/Classes:
>
>     MLCONSTANTS:[3] Added hyperparameters for Stacking
>     Util:[4] Class with helper methods
>     BaseModelBuilder: for building models (*will commit once done*)
>
> [1]
> https://github.com/zemoel/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/SupervisedSparkModelBuilder.java#L829
> [2]
> https://github.com/zemoel/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/Stacking.java
> [3]
> https://github.com/zemoel/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.commons/src/main/java/org/wso2/carbon/ml/commons/constants/MLConstants.java
> [4]
> https://github.com/zemoel/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/utils/Util.java
>
> On 13.06.2016 05:46, Supun Sethunga wrote:
>
> Hi Misgana,
>
> How is the work going so far? Can you give a brief update on the current
> status, and what is left to be done?
>
> Thanks,
> Supun
>
> On Wed, Jun 8, 2016 at 5:54 PM, Misgana Negassi <
> [email protected]> wrote:
>
>> Hi Supun,
>>
>> Thank you for your suggestions. I would like to abstract from the
>> concrete the level0 models and concrete level1 model and needed a suitable
>> interface for that. MLModel seemed a good choice also in perspective of
>> later integration of  stacking.
>>
>> From my point of view the native spark models are not suitable because
>> they don't share a suitable interface like MLModel. My approach is to use
>> those native spark models and convert them to carbonml type models using
>> eg. MLRandomForest.setModel() for RandomForestModel.
>>
>> Also now I found a way to use the Predictor without a configuration
>> context by passing an empty encoding.
>>
>> I will put extra effort to meet the deadline and create code which is
>> scalable to later change/integration. I apologize for any miscommunication
>> that may arise from my side.
>>
>> Best regards,
>> Misgana
>>
>> On 08.06.2016 05:35, Supun Sethunga wrote:
>>
>> Hi,
>>
>> Also just a gentle reminder.., we have just under two weeks for the
>> mid-term evaluation. We need to have some end to end working scenario of
>> stacking by then. So lets put some extra effort and try to complete one
>> scenario.
>>
>> Thanks,
>> Supun
>>
>> On Tue, Jun 7, 2016 at 10:48 PM, Supun Sethunga < <[email protected]>
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I have restructured my code, see [1][2]. It is a naive implementation if
>>>> my logic works.
>>>
>>> Why there are two files/implementations? which is the correct one? Say,
>>> if I want to try out your implementation, which one should I run?
>>>
>>> When creating an instance of Stacking class, I need to pass an argument
>>>> of type MLModelConfigurationContext. and getcontext() method doesn't work
>>>> as I am in a static main method.
>>>> Question: 1. What is the best way to get context argument?
>>>>                 2.  Is the design of Stacking class fine?
>>>
>>>
>>> Why would you need to use MLModelConfigurationContext. and getcontext()
>>> methods? Those are utility methods used at the ML server runtime to
>>> temporary store configurations. Let's not worry about those, as they are
>>> part of the integration phase. First, try to implement the stacking with
>>> native spark-mllib libraries, and re-use methods/components in ML server 
>>> *ONLY
>>> IF * they are necessary or they are re-usable (i.e: if some method is
>>> already available, which you need).
>>>
>>> Regards,
>>> Supun
>>>
>>> On Tue, Jun 7, 2016 at 8:31 PM, Misgana Negassi <
>>> <[email protected]>[email protected]> wrote:
>>>
>>>>
>>>> Hi Supun,
>>>>
>>>> My dependencies problems are solved, thanks!
>>>>
>>>> I have restructured my code, see [1][2]. It is a naive implementation
>>>> if my logic works.
>>>>
>>>>
>>>> When creating an instance of Stacking class, I need to pass an argument
>>>> of type MLModelConfigurationContext. and getcontext() method doesn't work
>>>> as I am in a static main method.
>>>> Question: 1. What is the best way to get context argument?
>>>>                 2.  Is the design of Stacking class fine?
>>>>
>>>> I appreciate your feedback!
>>>> Misgana
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://github.com/zemoel/ensemble-methods/blob/master/src/main/java/Stacking.java
>>>> [2]
>>>> https://github.com/zemoel/ensemble-methods/blob/master/src/main/java/ReadCSV.java#L188
>>>>
>>>>
>>>> On 04.06.2016 06:18, Supun Sethunga wrote:
>>>>
>>>> Hi,
>>>>
>>>> Can you check whether you have defined the relevant repositories in the
>>>> pom.xml? If haven't, please do so as in [1].
>>>>
>>>> If that didn't work out, can you try checking out the source code of
>>>> [2], and build it locally, and then build your code?
>>>>
>>>> [1]  <https://github.com/wso2/carbon-ml/blob/master/pom.xml#L59>
>>>> https://github.com/wso2/carbon-ml/blob/master/pom.xml#L59
>>>> [2]
>>>> <https://github.com/wso2/carbon-metrics/tree/v1.1.0/components/org.wso2.carbon.metrics.manager>
>>>> https://github.com/wso2/carbon-metrics/tree/v1.1.0/components/org.wso2.carbon.metrics.manager
>>>>
>>>> Regards,
>>>>
>>>> On Fri, Jun 3, 2016 at 7:06 PM, Misgana Negassi <
>>>> <[email protected]>[email protected]> wrote:
>>>>
>>>>> Hi Supun,
>>>>>
>>>>> This  is [1] the predict() method invoked.
>>>>>
>>>>> I added the dependencies and run mvn clean install and am trying to
>>>>> debug this error now:
>>>>>
>>>>>
>>>>>
>>>>> *Failed to execute goal on project ensemble-methods: Could not resolve
>>>>> dependencies for project
>>>>> org.wso2.carbon.ml:ensemble-methods:jar:1.0-SNAPSHOT: The following
>>>>> artifacts could not be resolved:
>>>>> org.wso2.carbon.metrics:org.wso2.carbon.metrics.manager:jar:1.1.0 *
>>>>> Best
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80>
>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80
>>>>>
>>>>> On 03.06.2016 14:11, Supun Sethunga wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can you specify which predict() method, to be precise?
>>>>>
>>>>> Anyway, looking at the error trace, this seems to be a version
>>>>> mismatch for the dependency "org.wso2.carbon.metrics ". Can you try
>>>>> adding v1.1.0?
>>>>> i.e: <dependency> <groupId>org.wso2.carbon.metrics</groupId>
>>>>> <artifactId>org.wso2.carbon.metrics.manager</artifactId> <version>
>>>>> 1.1.0</version> </dependency>
>>>>> Regards,
>>>>>
>>>>> On Fri, Jun 3, 2016 at 5:21 PM, Misgana Negassi <
>>>>> <[email protected]>[email protected]> wrote:
>>>>>
>>>>>> Hi Supun,
>>>>>> I get this error when calling predict() method.
>>>>>>
>>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>> org/wso2/carbon/metrics/manager/Level
>>>>>>     at
>>>>>> org.wso2.carbon.ml.core.impl.Predictor.getTimer(Predictor.java:301)
>>>>>>     at
>>>>>> org.wso2.carbon.ml.core.impl.Predictor.predict(Predictor.java:83)
>>>>>>     at testing.main(testing.java:27)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>>     at
>>>>>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>> org.wso2.carbon.metrics.manager.Level
>>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>
>>>>>>
>>>>>> I added the module dependency "org.wso2.carbon.metrics " but still
>>>>>> the error persists. Appreciate your help!
>>>>>>
>>>>>> Best,
>>>>>> Misgana
>>>>>>
>>>>>>
>>>>>> On 02.06.2016 10:49, Supun Sethunga wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> A sample on how to invoke the predict() method can be found in the
>>>>>> above links ([1], [2], [3] in order) I've shared..
>>>>>>
>>>>>> Regards,
>>>>>> Supun
>>>>>>
>>>>>> On Thu, Jun 2, 2016 at 2:13 PM, Misgana Negassi <
>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>
>>>>>>> Hi Supun,
>>>>>>>
>>>>>>> My current implementation is to invoke train/test method of the
>>>>>>> Algorithms classes.
>>>>>>>
>>>>>>> It would be nice if you can give a small example how to invoke this
>>>>>>> predict() method in getting predictions of basemodels?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Best,
>>>>>>> Misgana
>>>>>>>
>>>>>>> On 02.06.2016 07:12, Supun Sethunga wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> As of your suggestion, I will abandon this approach and implement
>>>>>>>> using e.g Random Forest as meta-learner.
>>>>>>>
>>>>>>> Ack.
>>>>>>>
>>>>>>> The predictions of base-models are of JavaPairRDD<Double,Double>
>>>>>>>> which is a tuple of predictions and
>>>>>>>> corresponding labels. Am I right?
>>>>>>>
>>>>>>> Well, the predictions of base models are a List of class labels (if
>>>>>>> it is a classification). i.e. List<?>.  Please check the predict() 
>>>>>>> methods
>>>>>>> [1], [2] and [3]. Then you'll get the better understanding of the flow 
>>>>>>> of
>>>>>>> predictions happens, and what data types are returned.
>>>>>>> JavaPairRDD<Double,Double> are returned when we use the test()
>>>>>>> method [3]. That returns the predicted class (encoded value of that 
>>>>>>> class,
>>>>>>> which is a double), and the actual class (again the encoded value, 
>>>>>>> which is
>>>>>>> a double).
>>>>>>>
>>>>>>> I guess we have to use the predict() method (ahead of test()
>>>>>>> method), because we can use that same method for both testing 
>>>>>>> (evaluating)
>>>>>>> and predicting, when using the stacking method.
>>>>>>>
>>>>>>>
>>>>>>> I have used many approaches to convert this to a matrix I can use
>>>>>>>> for training meta-algorithm.
>>>>>>>> Do you have a more efficient idea/hint to convert
>>>>>>>> JavaPairRDD<Double,Double> to JavaRDD<LabeledPoint> before I go crazy 
>>>>>>>> with
>>>>>>>> workarounds?
>>>>>>>
>>>>>>> Now that since we are going to use predict() method as above, what
>>>>>>> we get from each base-algorithm is a List<?>. And we need to combine
>>>>>>> multiple List<?> s (one list from each base learner), and form a 
>>>>>>> JavaRDD<LabeledPoint>.
>>>>>>> I haven't tried any similar method but I can think of two approaches:
>>>>>>>
>>>>>>>    - First combine the multiple Lists to create a single List, and
>>>>>>>    then convert it to a JavaRDD<> [5].
>>>>>>>    - *OR* convert each List to a JavaRDD first and then combine
>>>>>>>    those JavaRDDs to form the single JavaRDD<LabeledPoint>
>>>>>>>
>>>>>>> [1]
>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.rest.api/src/main/java/org/wso2/carbon/ml/rest/api/ModelApiV20.java#L241>
>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.rest.api/src/main/java/org/wso2/carbon/ml/rest/api/ModelApiV20.java#L241
>>>>>>> [2]
>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/MLModelHandler.java#L632>
>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/MLModelHandler.java#L632
>>>>>>> [3]
>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80>
>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/impl/Predictor.java#L80
>>>>>>> [4]
>>>>>>> <https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/RandomForestClassifier.java#L61>
>>>>>>> https://github.com/wso2/carbon-ml/blob/master/components/ml/org.wso2.carbon.ml.core/src/main/java/org/wso2/carbon/ml/core/spark/algorithms/RandomForestClassifier.java#L61
>>>>>>> [5]
>>>>>>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#parallelize%28java.util.List%29>
>>>>>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#parallelize(java.util.List)
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Supun
>>>>>>>
>>>>>>> On Wed, Jun 1, 2016 at 7:15 PM, Misgana Negassi <
>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Supun,
>>>>>>>>
>>>>>>>> I was implementing the paper[1], that's why I wrote a method to
>>>>>>>> compute weights using quadratic programming solver (btw I checked if 
>>>>>>>> the
>>>>>>>> third-party libraries are open source).
>>>>>>>>
>>>>>>>> As of your suggestion, I will abandon this approach and implement
>>>>>>>> using e.g Random Forest as meta-learner.
>>>>>>>>
>>>>>>>> I also have a question:
>>>>>>>>
>>>>>>>> The predictions of base-models are of JavaPairRDD<Double,Double>
>>>>>>>> which is a tuple of predictions and
>>>>>>>> corresponding labels. Am I right?
>>>>>>>>
>>>>>>>> I have used many approaches to convert this to a matrix I can use
>>>>>>>> for training meta-algorithm.
>>>>>>>> Do you have a more efficient idea/hint to convert
>>>>>>>> JavaPairRDD<Double,Double> to JavaRDD<LabeledPoint> before I go crazy 
>>>>>>>> with
>>>>>>>> workarounds?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Misgana
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] <https://arxiv.org/pdf/1105.5466.pdf>
>>>>>>>> https://arxiv.org/pdf/1105.5466.pdf
>>>>>>>>
>>>>>>>> On 01.06.2016 14:41, Supun Sethunga wrote:
>>>>>>>>
>>>>>>>> Hi Misgana,
>>>>>>>>
>>>>>>>> I went through your current implementation for stacking. Good to
>>>>>>>> see you have made progress!
>>>>>>>>
>>>>>>>> However, in your implementation, I noticed that you are calculating
>>>>>>>> weights to combine the base-algorithms. IMO, we don't need to calculate
>>>>>>>> weights for stacking. We can simply combine the results of 
>>>>>>>> base-learners,
>>>>>>>> using another well-known algorithm (aka meta-algorithm/meat-learner), 
>>>>>>>> such
>>>>>>>> as Random Forest, Decision Tree, etc..
>>>>>>>> For Bagging, you may calculate the weights for combining the
>>>>>>>> base-models.
>>>>>>>>
>>>>>>>> Also, please note that, when using external libraries/ third-party
>>>>>>>> dependencies, they have to have open-source licence. (eg: Apache 2.0
>>>>>>>> licence, MIT licence)
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Supun
>>>>>>>>
>>>>>>>> On Wed, Jun 1, 2016 at 9:29 AM, Supun Sethunga < <[email protected]>
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Misgana,
>>>>>>>>>
>>>>>>>>> Any update on the progress?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>> On Fri, May 27, 2016 at 4:49 PM, Supun Sethunga <
>>>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Misgana,
>>>>>>>>>>
>>>>>>>>>> carbon-ml uses spark 1.4.1 [1]. And yes, as you have mentioned,
>>>>>>>>>> the probabilistic classification algorithms reurns the predicted 
>>>>>>>>>> class (and
>>>>>>>>>> its encoded numeric value), but not the probability.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, we cant bump the spark version to 1.6.1 at this
>>>>>>>>>> point, as the new spark alogrithms uses Dataframes (where as 1.4.1 
>>>>>>>>>> uses
>>>>>>>>>> RDDs), and hence require a huge refactoring the entire carbon-ml 
>>>>>>>>>> code. This
>>>>>>>>>> will not be a feasible solution, given the project timelines.
>>>>>>>>>>
>>>>>>>>>> Therefore, we might have no other option, but to go with
>>>>>>>>>> predictions (class label / encoded-value) we have. Lets not
>>>>>>>>>> worry about the accuracy for now (unless theres a huge difference). 
>>>>>>>>>> We can
>>>>>>>>>> have it as a future improvement.
>>>>>>>>>>
>>>>>>>>>> [1]  <https://github.com/wso2/carbon-ml/blob/master/pom.xml#L654>
>>>>>>>>>> https://github.com/wso2/carbon-ml/blob/master/pom.xml#L654
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 27, 2016 at 1:42 PM, Misgana Negassi <
>>>>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>
>>>>>>>>>>> My implementation of Stacking expects prediction of class
>>>>>>>>>>> probabilities. But the predictions, I get from building the 
>>>>>>>>>>> algorithms is
>>>>>>>>>>> not probabilistic, although the classification algorithms extend 
>>>>>>>>>>> this[1]
>>>>>>>>>>> class(as of Spark1.6.1).
>>>>>>>>>>> My Question:
>>>>>>>>>>>
>>>>>>>>>>> 1.Does carbon ml use the latest Spark java api package? If so, I
>>>>>>>>>>> can't invoke predictProbability method on it.
>>>>>>>>>>>
>>>>>>>>>>> 2.If it doesn't support probabilistic predictions,
>>>>>>>>>>> My solution:
>>>>>>>>>>>  >>Create a naive-method which converts predictions to be
>>>>>>>>>>> probabilistic and later maybe we can think of doing something smart 
>>>>>>>>>>> with it.
>>>>>>>>>>>
>>>>>>>>>>> 3. If not, we work with predictions we have. Though, the paper I
>>>>>>>>>>> am implementing clearly says ensembling confidences of predictions 
>>>>>>>>>>> yield to
>>>>>>>>>>> better predictions for classification tasks.
>>>>>>>>>>>
>>>>>>>>>>> Grateful for your ideas!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Misgana
>>>>>>>>>>>
>>>>>>>>>>> [1] <https://spark.apache.org/docs/1.6.1/api/java/index.html>
>>>>>>>>>>> https://spark.apache.org/docs/1.6.1/api/java/index.html
>>>>>>>>>>>
>>>>>>>>>>> On 23.05.2016 15:21, Misgana Negassi wrote:
>>>>>>>>>>>
>>>>>>>>>>> Awesome!
>>>>>>>>>>>
>>>>>>>>>>> Yes, I decided to implement Stacking following ideas presented
>>>>>>>>>>> in this paper[1],  which is to implement a meta-algorithm which 
>>>>>>>>>>> combines
>>>>>>>>>>> predictions of class probabilities using least-squares linear 
>>>>>>>>>>> regression
>>>>>>>>>>> with non-negativity constraint adapted for classification tasks.
>>>>>>>>>>>
>>>>>>>>>>> I will basically follow the coding style of "Algorithms"
>>>>>>>>>>> implementations with train and test methods. I will commit my 
>>>>>>>>>>> updates and
>>>>>>>>>>> get back to you on questions.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Misgana
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1] <https://arxiv.org/pdf/1105.5466.pdf>
>>>>>>>>>>> https://arxiv.org/pdf/1105.5466.pdf
>>>>>>>>>>>
>>>>>>>>>>> On 23.05.2016 08:30, Supun Sethunga wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Misgana,
>>>>>>>>>>>
>>>>>>>>>>> Believe you are progressing well, and have a better
>>>>>>>>>>> understanding of the ML code base and what is needed to be done, by 
>>>>>>>>>>> now.
>>>>>>>>>>>
>>>>>>>>>>> As you may already know, coding starts from today (23rd) onwards
>>>>>>>>>>> according to the gsoc timeline, and have about one month (till 20th 
>>>>>>>>>>> June)
>>>>>>>>>>> for the midterm evaluation. Let's plan to get one ensemble method 
>>>>>>>>>>> working
>>>>>>>>>>> from end to end, including the UI, by the midterm evaluation. So, 
>>>>>>>>>>> have you
>>>>>>>>>>> decided on which method to implement first? (stacking?)
>>>>>>>>>>>
>>>>>>>>>>> Lets's gear up, and start coding :)
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Supun
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 20, 2016 at 9:38 AM, Supun Sethunga <
>>>>>>>>>>> <[email protected]>[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Misgana,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the late response. Please find my answers in-line.
>>>>>>>>>>>>
>>>>>>>>>>>> The easy problem: Is there a method like spark.mllib.util.kFold
>>>>>>>>>>>>> for JavaRDD? Or should I implement it myself. I could not find 
>>>>>>>>>>>>> anything in
>>>>>>>>>>>>> your utils.
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't it already accepts JavaRDDs? please refer [1].
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The second problem: Given the folds of the training set, how do
>>>>>>>>>>>>> I get the predictions from the base models? As far as I could 
>>>>>>>>>>>>> tell the
>>>>>>>>>>>>> ModelBuilder does not give me those. I see two possible solutions:
>>>>>>>>>>>>> 1) Create a new interface MachineLearningAlgorithm which
>>>>>>>>>>>>> provides train and predict methods and let each method implement 
>>>>>>>>>>>>> this
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>> 2) Copy the huge case statement from the ModelBuilder.
>>>>>>>>>>>>> Do you have any preferences? Or more ideas?
>>>>>>>>>>>>
>>>>>>>>>>>> Im not very clear how option (1) solve the problem.. Can you
>>>>>>>>>>>> explain it a bit? From my understanding, even if we create a new 
>>>>>>>>>>>> interface,
>>>>>>>>>>>> we will have to selectively create different algorithms for each 
>>>>>>>>>>>> fold,
>>>>>>>>>>>> based on the type of algorithm user picks, isn't it?
>>>>>>>>>>>>
>>>>>>>>>>>> Or else, you can refactor the code and put that case statement
>>>>>>>>>>>> to a utility method, and call that method wherever you need.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/util/MLUtils.html#kFold%28org.apache.spark.rdd.RDD,%20int,%20int,%20scala.reflect.ClassTag%29>
>>>>>>>>>>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/util/MLUtils.html#kFold(org.apache.spark.rdd.RDD,%20int,%20int,%20scala.reflect.ClassTag)
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Supun
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 18, 2016 at 8:41 PM, Misgana Negassi <
>>>>>>>>>>>> <[email protected]>[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>>>
>>>>>>>>>>>>> for my Stacking implementation I need predictions from other
>>>>>>>>>>>>> models on different folds of the training set. I have two 
>>>>>>>>>>>>> problems.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The easy problem: Is there a method like
>>>>>>>>>>>>> spark.mllib.util.kFold for JavaRDD? Or should I implement it 
>>>>>>>>>>>>> myself. I
>>>>>>>>>>>>> could not find anything in your utils.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The second problem: Given the folds of the training set, how
>>>>>>>>>>>>> do I get the predictions from the base models? As far as I could 
>>>>>>>>>>>>> tell the
>>>>>>>>>>>>> ModelBuilder does not give me those. I see two possible solutions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Create a new interface MachineLearningAlgorithm which
>>>>>>>>>>>>> provides train and predict methods and let each method implement 
>>>>>>>>>>>>> this
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Copy the huge case statement from the ModelBuilder.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any preferences? Or more ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>> About the Eclipse error:
>>>>>>>>>>>>> I inserted an ignore tag in the pom.xml. After mvn clean, it
>>>>>>>>>>>>> doesn't have compilation error anymore.
>>>>>>>>>>>>> But I decided to stay working with IntellijIDEA unless you
>>>>>>>>>>>>> advice against it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Misgana
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16.05.2016 14:37, Supun Sethunga wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you try doing a "mvn clean"?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you get any
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>
>> [Message clipped]
>
>
>
>
> --
> *Supun Sethunga*
> Senior Software Engineer
> WSO2, Inc.
> <http://wso2.com/>http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
> Blog:  <http://supunsetunga.blogspot.com>http://supunsetunga.blogspot.com
>
>
>


-- 
*Supun Sethunga*
Senior Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
Blog: http://supunsetunga.blogspot.com

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [GSOC2016]Proposal 4: [ML] Ensemble Methods Support for WSO2 Machine Learner

Reply via email to