Hello everyone, In continuation to the previous email, I made a small typo there. It is `DecisionTreeRegressor` instead of `RandomForestClassifier`.
I gave a deeper thought and I realised that there is so much more that I can do with gradient boosting trees like adding feature importance, warm start, pruning, etc. So, I have decided to drop the idea of XGBoost from the project and I will be investing the remaining time into implementing these extra features. I have been digging deep into the decision tree implementation and I figured out that it has been built very flexibly and regression tree can be implemented through it just by adding a new template parameter (which will specify whether we want classification or regression) and adding a few overloads of the existing helper functions. So, I thinking there will be no need to make an abstract class and regression can be implemented without doing any drastic refactoring to the existing `DecisionTree` class. Although we will need to add a few fitness functions. I will share the full technical details of it in my proposal. Looking forward for the feedback. Thanks and regards, Rishabh Garg On Tue, Mar 16, 2021 at 4:10 PM RISHABH GARG <[email protected]> wrote: > Greetings mlpack family, > I am Rishabh Garg, 2nd year Computer Science student at IIT Mandi, India. > I am very interested in pursuing the GSoC idea of “Improving mlpack’s tree > ensemble support” posted on the GSoC Idea List for 2021. A few days ago, I > shared another idea related to time series forecasting. I like both the > ideas and it is really difficult for me to choose one. So, maybe the mlpack > family could help me figure out which one is better :-) > I apologise in advance if this email gets too big. > > I would like to implement Gradient Boosting Classifier and Regressor as a > part of the project. The following is my plan of action. > > After digging into the codebase for `trees` in mlpack, I realised that we > don’t have a regression tree. A regression tree is at the core of gradient > boosted trees. Thus, first priority would be to implement a > `RegressionTree` class. I am thinking of making a base `DecisionTree` class > from which `DecisionTreeClassifier` and `RandomForestClassifier` can > inherit. This means it would require to refactor the existing code a little > bit. > > Then once regression tree is ready, the Gradient Boosting Tree algorithms > can be implemented. For them also, I am thinking of a similar approach of > making a base `GradientBoosting` class from which the > `GradientBoostingClassifier` and `GradientBoostingRegressor` can be > inherited. > > One really nice feature I found in sklearn’s GradientBoostingTrees is that > we can train additional estimator trees on an existing trained one. This > really helps in the development phase when we are trying different hyper > parameters. Thus I would love to integrate that feature in the mlpack’s > implementation too. > > So, coding the algorithms, refactoring existing code, writing unit tests, > adding documentation, making bindings, searching for good default hyper > parameters and adding tutorials/examples for the above three added > algorithms would be enough to keep me occupied for the whole summer. Don’t > want to be too ambitious, but if still time permits then I might look into > implementing XGBoost. Once, the GradientBoostingTrees are implemented, it > would make it slightly easy to implement XGBoost. But, provided that > XGBoost is really Xtreme due to its weighted quantiles, parallel learning, > out of cache optimisation etc. it would be really difficult to finish it > along with the other algorithms within the GSoC time period. > > I would love to hear suggestions from the community. Also, If my idea and > goals seems plausible, then I would love to provide a more detailed > proposal of what I would be doing — like how the API would look like, how > the end user will use these classes, some more implementation details or > pseudocode, timeline of project etc. > > The mentor for this project is not updated on the GSoC Ideas page > <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas>. I would love > to know who will be mentoring it. > > Also if it feels like there are any flaws in the idea, then please provide > your valuable feedback. > > Looking forward for the replies. Thanks for reading till the end. > > Best regards, > Rishabh Garg > Github - RishabhGarg108 <https://github.com/RishabhGarg108> >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
