Re: [mlpack] Potential Proposal for GSoC 2021

Germán Lancioni Mon, 29 Mar 2021 09:17:34 -0700

Hi Anush,

Sounds good. Hopefully you can re-use the existing decision tree infrastructure 
in mlpack to build XGB.
If you believe the scope of work is big, you can always aim for the 
implementation of one tree method (e.g. either greedy or histogram) and not 
necessarily all of them, but I will let you judge how much time it may take.

Looking forward to the proposal.

Regards,
German

________________________________
From: mlpack <[email protected]> on behalf of Anush Kini 
<[email protected]>
Sent: Sunday, March 28, 2021 09:50 AM
To: [email protected] <[email protected]>
Subject: Re: [mlpack] Potential Proposal for GSoC 2021

Hi everyone,

This mail is in continuation with the previous discussion on a proposal for 
GSoC 2021.
I spent the past few days going through the feasibility of implementing 
multiple algorithms.
I have decided that I will focus all the time on implementing the XGBoost 
algorithm.

Specifically, I would like to implement a XGBoost Regressor and Classifier. 
This would involve adding support for XGBoost Trees.
Additionally, I am looking into adding features of pruning, approximate greedy 
algorithms (To speed up the algorithm for large datasets), and feature 
importance.

Will consolidate the details in a draft proposal soon.
Any opinions or suggestions are welcome.

Regards,
Anush Kini

On Wed, Mar 17, 2021 at 10:42 AM Anush Kini 
<[email protected]<mailto:[email protected]>> wrote:
Hi German,

Thanks for the feedback.
I agree. It is better to commit to completely implement one algorithm than to 
partially implement many.
Will consider this in my proposal.

Regards,
Anush Kini

On Mon, Mar 15, 2021 at 11:14 PM Germán Lancioni 
<[email protected]<mailto:[email protected]>> wrote:
Hi Anush,

This is a great area to work on. As Omar mentioned, a good scope maximizes and 
focuses your GSoC effort. If you notice that the available GSoC time is not 
enough, I would recommend implementing just 1 of the algorithms, e.g. XGB so 
you can concentrate on the completeness of it instead of stretching your time 
with 3.

Looking forward to your proposal, very exiting!

Regards,
German

________________________________
From: mlpack 
<[email protected]<mailto:[email protected]>> on 
behalf of Anush Kini <[email protected]<mailto:[email protected]>>
Sent: Monday, March 15, 2021 09:14 AM
To: Omar Shrit <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [mlpack] Potential Proposal for GSoC 2021

Hi Omar,

Thank you for the inputs.
What you said makes complete sense to me.

I will look towards prioritising algorithm correctness, detailed documentation 
and tutorials over implementing multiple features.
Additionally, will highlight proof of concept through sample codes and metrics 
in my proposal.

Thanks & Regards,
Anush Kini

On Mon, Mar 15, 2021 at 3:43 PM Omar Shrit 
<[email protected]<mailto:[email protected]>> wrote:
Hello Anush,

XGBoost, LightGBM and CatBoost algorithms will be a great addition for
mlpack this year. Since GSoC is shorter, I would concentrate on these
algorithms, with relative tests and examples.

You need to demonstrate in your proposal, that you have a good knowledge
of decision tree algorithms. As always a good starting point is a proof
of concept with relative benchmarks.

These are my suggestions, hope you find this helpful.

Thanks,

Omar

On 03/14, Anush Kini wrote:
> Hi Mlpack team,
>
> I am Anush Kini. My GitHub handle is Abilityguy
> <https://github.com/Abilityguy>.
>
> I have been getting familiar with the code base for the last couple of
> months.
> I am planning to apply for GSoC 2021 and wanted some feedback on my project
> proposal for the same.
>
> I am building on the 'Improve mlpack's tree ensemble support' idea from the
> wiki.
> I would like to implement XGBoost and LightGBM algorithms. If the schedule
> permits, I will look towards implementing CatBoost too.
>
> Additionally, I would like to work on bringing some additional features to
> the ensemble suite:
> 1. I would like to dip into 2619
> <https://github.com/mlpack/mlpack/issues/2619> which aims to implement
> regression support to Random Forests.
> 2. Implementing methods to get the impurity based feature importance
> similar to the one in scikit-learn
> <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_>
> .
>
> Finally, I plan to supplement any new features implemented with tutorials
> in mlpack/examples <https://github.com/mlpack/examples>.
> Looking forward to hearing your opinions and suggestions.
>
> Thanks & Regards,
> Anush Kini

> _______________________________________________
> mlpack mailing list
> [email protected]<mailto:[email protected]>
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Potential Proposal for GSoC 2021

Reply via email to