Hi,

Please help me to be clarified on this.
Apart from the implementation of those ensemble methods at the back end,
we're supposed to develop some UI features.
Which options should be available for the user to get decisions? For
example, if the user is going to use the bagging method, the number of
samples can be pre-defined by the user.

Regards,
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa.




On Mon, Feb 29, 2016 at 11:04 AM, Supun Sethunga <[email protected]> wrote:

> Hi Minudika,
>
> Thank you for your interest in the project.
>
> GBT and Random Forest are well known ensemble methods, and are readily
> available as a single algorithm OOB in spark. So we need not be
> implementing them again. You may treat them as any other simpler algorithm,
> for the project.
>
> Let me clarify the few things. For ensemble methods, you can consider the
> following three options:
>
>    - Stacking - Training multiple algos on the same data, and combining
>    them using another algo.
>    - Bagging - Training a single algo over subsets of data.
>    - Boosting - Training multiple algos on the same data, and combining
>    them over a weighted average.
>
> Personally I would prefer picking Stacking (since Boosting is a special
> case of Stacking, later would cover both) and Bagging for implementation,
> but you may pick appropriately. AFAIK these three methods are not available
> OOB in spark. (except for Boosting in GBT and and bagging in Random
> Forest).  Expectation of the project is to implement such a logic, where
> a user can use any algorithm(s), pick the ensemble method, and train a
> model.
>
> For bagging, you can use sampling techniques available in spark (eg:
> rdd.sample(), df.sample() etc ) [1].
>
> Please do let us know if you need further clarifications.
>
> [1] http://spark.apache.org/docs/latest/api/java/
>
> Regards,
> Supun
>
> On Mon, Feb 29, 2016 at 12:07 AM, Minudika Malshan <[email protected]>
> wrote:
>
>> Hi,
>>
>> I found out that spark.ml Lib supports two ensemble algorithms, GBT and
>> Random Forest.
>> Will it be possible to implement Bagging  and boosting methods using ml
>> Lib features?
>>
>> Also I'm grateful if you can give me some resources to getting started
>> with implementation of Bagging method using ml Lib functionalities. If
>> there's any other library which is allowed to use for this implementation,
>> please let me know.
>>
>> Thanks and regards.
>> Minudika
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa.
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> [email protected]
>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>
>>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to