Hello Ryan, Thanks for sharing the approach. I liked it and maybe we could use it, but while writing the proposal I realised it might be too much to work on along with other things in the limited GSoC time period.
Although, what we will be implementing will be called XGBoost only when we are able to scale to larger datasets. Thus, at some point, we will have to implement this too. So, would it be okay if we can shift it to post GSoC ? Thanks, Rishabh On Mon, 29 Mar 2021, 17:33 Ryan Curtin, <[email protected]> wrote: > On Mon, Mar 29, 2021 at 10:59:32AM +0530, RISHABH GARG wrote: > > Hey Ryan, thanks for the feedback. > > > > > > I also agree with you. XGBoost is one of the most widely used ML > > algorithms. It would be really great for MLPACK to have it and this will > > undoubtedly attract more and more users to MLPACK. This discussion with > > you, has changed my perspective and I think we can prioritise XGBoost > over > > others. > > > > > > As you mentioned in the previous mail, it will be straightforward to > > implement the core XGBoost algorithm provided the flexible implementation > > of trees in MLPACK. But, how can we implement optimisations like > > cache-aware access and out-of-core computation with armadillo matrices? I > > remember I had a chat with you related to this and you slightly mentioned > > that it can be done with a simple tweak. Can you please elaborate it a > bit? > > I wouldn't worry about out-of-core learning for your proposal---ideally > we should just be able to demonstrate that the performance of what we > implement is comparable to XGBoost's performance. > > That said, if you are interested in doing out-of-core learning, the way > I know to do it is to create a file of the right size on disk (e.g. > n_rows * n_cols * sizeof(double) bytes). Then, in your program, use > mmap() to memory map the file. This will give you a pointer to some > memory, which you can cast to a double*. You can then use the Armadillo > advanced constructor that takes a memory pointer to create the Armadillo > matrix that is wrapped around the mmap()-ed file. Now, ta-da, you have > an out-of-core matrix. :) (But some restrictions are that you can't > resize it, and operations on that matrix that result in a new matrix > will not be mmap()-ed.) > > Anyway, hope that is helpful! > > Thanks, > > Ryan > > -- > Ryan Curtin | "If it's something that can be stopped, then just try to > stop it!" > [email protected] | - Skull Kid >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
