Hi Minudika,

Thank you for your interest in the project.

GBT and Random Forest are well known ensemble methods, and are readily
available as a single algorithm OOB in spark. So we need not be
implementing them again. You may treat them as any other simpler algorithm,
for the project.

Let me clarify the few things. For ensemble methods, you can consider the
following three options:

   - Stacking - Training multiple algos on the same data, and combining
   them using another algo.
   - Bagging - Training a single algo over subsets of data.
   - Boosting - Training multiple algos on the same data, and combining
   them over a weighted average.

Personally I would prefer picking Stacking (since Boosting is a special
case of Stacking, later would cover both) and Bagging for implementation,
but you may pick appropriately. AFAIK these three methods are not available
OOB in spark. (except for Boosting in GBT and and bagging in Random
Forest).  Expectation of the project is to implement such a logic, where a
user can use any algorithm(s), pick the ensemble method, and train a model.

For bagging, you can use sampling techniques available in spark (eg:
rdd.sample(), df.sample() etc ) [1].

Please do let us know if you need further clarifications.

[1] http://spark.apache.org/docs/latest/api/java/

Regards,
Supun

On Mon, Feb 29, 2016 at 12:07 AM, Minudika Malshan <[email protected]>
wrote:

> Hi,
>
> I found out that spark.ml Lib supports two ensemble algorithms, GBT and
> Random Forest.
> Will it be possible to implement Bagging  and boosting methods using ml
> Lib features?
>
> Also I'm grateful if you can give me some resources to getting started
> with implementation of Bagging method using ml Lib functionalities. If
> there's any other library which is allowed to use for this implementation,
> please let me know.
>
> Thanks and regards.
> Minudika
>
> Minudika Malshan
> Undergraduate
> Department of Computer Science and Engineering
> University of Moratuwa.
>
>
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to