Hi everyone, This mail is in continuation with the previous discussion on a proposal for GSoC 2021. I spent the past few days going through the feasibility of implementing multiple algorithms. I have decided that I will focus all the time on implementing the XGBoost algorithm.
Specifically, I would like to implement a XGBoost Regressor and Classifier. This would involve adding support for XGBoost Trees. Additionally, I am looking into adding features of pruning, approximate greedy algorithms (To speed up the algorithm for large datasets), and feature importance. Will consolidate the details in a draft proposal soon. Any opinions or suggestions are welcome. Regards, Anush Kini On Wed, Mar 17, 2021 at 10:42 AM Anush Kini <[email protected]> wrote: > Hi German, > > Thanks for the feedback. > I agree. It is better to commit to completely implement one algorithm than > to partially implement many. > Will consider this in my proposal. > > Regards, > Anush Kini > > On Mon, Mar 15, 2021 at 11:14 PM Germán Lancioni <[email protected]> > wrote: > >> Hi Anush, >> >> This is a great area to work on. As Omar mentioned, a good scope >> maximizes and focuses your GSoC effort. If you notice that the available >> GSoC time is not enough, I would recommend implementing just 1 of the >> algorithms, e.g. XGB so you can concentrate on the completeness of it >> instead of stretching your time with 3. >> >> Looking forward to your proposal, very exiting! >> >> Regards, >> German >> >> ------------------------------ >> *From:* mlpack <[email protected]> on behalf of Anush Kini >> <[email protected]> >> *Sent:* Monday, March 15, 2021 09:14 AM >> *To:* Omar Shrit <[email protected]> >> *Cc:* [email protected] <[email protected]> >> *Subject:* Re: [mlpack] Potential Proposal for GSoC 2021 >> >> Hi Omar, >> >> Thank you for the inputs. >> What you said makes complete sense to me. >> >> I will look towards prioritising algorithm correctness, detailed >> documentation and tutorials over implementing multiple features. >> Additionally, will highlight proof of concept through sample codes and >> metrics in my proposal. >> >> Thanks & Regards, >> Anush Kini >> >> On Mon, Mar 15, 2021 at 3:43 PM Omar Shrit <[email protected]> wrote: >> >> Hello Anush, >> >> XGBoost, LightGBM and CatBoost algorithms will be a great addition for >> mlpack this year. Since GSoC is shorter, I would concentrate on these >> algorithms, with relative tests and examples. >> >> You need to demonstrate in your proposal, that you have a good knowledge >> of decision tree algorithms. As always a good starting point is a proof >> of concept with relative benchmarks. >> >> These are my suggestions, hope you find this helpful. >> >> Thanks, >> >> Omar >> >> On 03/14, Anush Kini wrote: >> > Hi Mlpack team, >> > >> > I am Anush Kini. My GitHub handle is Abilityguy >> > <https://github.com/Abilityguy>. >> > >> > I have been getting familiar with the code base for the last couple of >> > months. >> > I am planning to apply for GSoC 2021 and wanted some feedback on my >> project >> > proposal for the same. >> > >> > I am building on the 'Improve mlpack's tree ensemble support' idea from >> the >> > wiki. >> > I would like to implement XGBoost and LightGBM algorithms. If the >> schedule >> > permits, I will look towards implementing CatBoost too. >> > >> > Additionally, I would like to work on bringing some additional features >> to >> > the ensemble suite: >> > 1. I would like to dip into 2619 >> > <https://github.com/mlpack/mlpack/issues/2619> which aims to implement >> > regression support to Random Forests. >> > 2. Implementing methods to get the impurity based feature importance >> > similar to the one in scikit-learn >> > < >> https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_ >> > >> > . >> > >> > Finally, I plan to supplement any new features implemented with >> tutorials >> > in mlpack/examples <https://github.com/mlpack/examples>. >> > Looking forward to hearing your opinions and suggestions. >> > >> > Thanks & Regards, >> > Anush Kini >> >> > _______________________________________________ >> > mlpack mailing list >> > [email protected] >> > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack >> >>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
