Greetings mlpack family, I am Rishabh Garg, 2nd year Computer Science student at IIT Mandi, India. I am very interested in pursuing the GSoC idea of “Improving mlpack’s tree ensemble support” posted on the GSoC Idea List for 2021. A few days ago, I shared another idea related to time series forecasting. I like both the ideas and it is really difficult for me to choose one. So, maybe the mlpack family could help me figure out which one is better :-) I apologise in advance if this email gets too big.
I would like to implement Gradient Boosting Classifier and Regressor as a part of the project. The following is my plan of action. After digging into the codebase for `trees` in mlpack, I realised that we don’t have a regression tree. A regression tree is at the core of gradient boosted trees. Thus, first priority would be to implement a `RegressionTree` class. I am thinking of making a base `DecisionTree` class from which `DecisionTreeClassifier` and `RandomForestClassifier` can inherit. This means it would require to refactor the existing code a little bit. Then once regression tree is ready, the Gradient Boosting Tree algorithms can be implemented. For them also, I am thinking of a similar approach of making a base `GradientBoosting` class from which the `GradientBoostingClassifier` and `GradientBoostingRegressor` can be inherited. One really nice feature I found in sklearn’s GradientBoostingTrees is that we can train additional estimator trees on an existing trained one. This really helps in the development phase when we are trying different hyper parameters. Thus I would love to integrate that feature in the mlpack’s implementation too. So, coding the algorithms, refactoring existing code, writing unit tests, adding documentation, making bindings, searching for good default hyper parameters and adding tutorials/examples for the above three added algorithms would be enough to keep me occupied for the whole summer. Don’t want to be too ambitious, but if still time permits then I might look into implementing XGBoost. Once, the GradientBoostingTrees are implemented, it would make it slightly easy to implement XGBoost. But, provided that XGBoost is really Xtreme due to its weighted quantiles, parallel learning, out of cache optimisation etc. it would be really difficult to finish it along with the other algorithms within the GSoC time period. I would love to hear suggestions from the community. Also, If my idea and goals seems plausible, then I would love to provide a more detailed proposal of what I would be doing — like how the API would look like, how the end user will use these classes, some more implementation details or pseudocode, timeline of project etc. The mentor for this project is not updated on the GSoC Ideas page <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas>. I would love to know who will be mentoring it. Also if it feels like there are any flaws in the idea, then please provide your valuable feedback. Looking forward for the replies. Thanks for reading till the end. Best regards, Rishabh Garg Github - RishabhGarg108 <https://github.com/RishabhGarg108>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
