Hi Minudika, Which options should be available for the user to get decisions? For > example, if the user is going to use the bagging method, the number of > samples can be pre-defined by the user.
I think that depends on the implementation. As you've mentioned, # of samples, would definitely has to be a user-input. Other than that, sample size, algorithm to be used, its hyper-parameters, aggregation criteria (if there are multiple ways of aggregating), etc might have to be taken from the user. Similarly, for stacking, we might have to get from user: the # of models, algorithm for each model, hyper-parameters for each model, algorithm for aggregation, and etc.. Regards, Supun On Wed, Mar 2, 2016 at 3:16 AM, Minudika Malshan <[email protected]> wrote: > Hi, > > Please help me to be clarified on this. > Apart from the implementation of those ensemble methods at the back end, > we're supposed to develop some UI features. > Which options should be available for the user to get decisions? For > example, if the user is going to use the bagging method, the number of > samples can be pre-defined by the user. > > Regards, > Minudika > > Minudika Malshan > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > > > > > On Mon, Feb 29, 2016 at 11:04 AM, Supun Sethunga <[email protected]> wrote: > >> Hi Minudika, >> >> Thank you for your interest in the project. >> >> GBT and Random Forest are well known ensemble methods, and are readily >> available as a single algorithm OOB in spark. So we need not be >> implementing them again. You may treat them as any other simpler algorithm, >> for the project. >> >> Let me clarify the few things. For ensemble methods, you can consider the >> following three options: >> >> - Stacking - Training multiple algos on the same data, and combining >> them using another algo. >> - Bagging - Training a single algo over subsets of data. >> - Boosting - Training multiple algos on the same data, and combining >> them over a weighted average. >> >> Personally I would prefer picking Stacking (since Boosting is a special >> case of Stacking, later would cover both) and Bagging for >> implementation, but you may pick appropriately. AFAIK these three methods >> are not available OOB in spark. (except for Boosting in GBT and and bagging >> in Random Forest). Expectation of the project is to implement such a >> logic, where a user can use any algorithm(s), pick the ensemble method, and >> train a model. >> >> For bagging, you can use sampling techniques available in spark (eg: >> rdd.sample(), df.sample() etc ) [1]. >> >> Please do let us know if you need further clarifications. >> >> [1] http://spark.apache.org/docs/latest/api/java/ >> >> Regards, >> Supun >> >> On Mon, Feb 29, 2016 at 12:07 AM, Minudika Malshan <[email protected] >> > wrote: >> >>> Hi, >>> >>> I found out that spark.ml Lib supports two ensemble algorithms, GBT and >>> Random Forest. >>> Will it be possible to implement Bagging and boosting methods using ml >>> Lib features? >>> >>> Also I'm grateful if you can give me some resources to getting started >>> with implementation of Bagging method using ml Lib functionalities. If >>> there's any other library which is allowed to use for this implementation, >>> please let me know. >>> >>> Thanks and regards. >>> Minudika >>> >>> Minudika Malshan >>> Undergraduate >>> Department of Computer Science and Engineering >>> University of Moratuwa. >>> >>> >>> >>> _______________________________________________ >>> Dev mailing list >>> [email protected] >>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>> >>> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> > > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
