On Mon, Mar 29, 2021 at 10:59:32AM +0530, RISHABH GARG wrote: > Hey Ryan, thanks for the feedback. > > > I also agree with you. XGBoost is one of the most widely used ML > algorithms. It would be really great for MLPACK to have it and this will > undoubtedly attract more and more users to MLPACK. This discussion with > you, has changed my perspective and I think we can prioritise XGBoost over > others. > > > As you mentioned in the previous mail, it will be straightforward to > implement the core XGBoost algorithm provided the flexible implementation > of trees in MLPACK. But, how can we implement optimisations like > cache-aware access and out-of-core computation with armadillo matrices? I > remember I had a chat with you related to this and you slightly mentioned > that it can be done with a simple tweak. Can you please elaborate it a bit?
I wouldn't worry about out-of-core learning for your proposal---ideally we should just be able to demonstrate that the performance of what we implement is comparable to XGBoost's performance. That said, if you are interested in doing out-of-core learning, the way I know to do it is to create a file of the right size on disk (e.g. n_rows * n_cols * sizeof(double) bytes). Then, in your program, use mmap() to memory map the file. This will give you a pointer to some memory, which you can cast to a double*. You can then use the Armadillo advanced constructor that takes a memory pointer to create the Armadillo matrix that is wrapped around the mmap()-ed file. Now, ta-da, you have an out-of-core matrix. :) (But some restrictions are that you can't resize it, and operations on that matrix that result in a new matrix will not be mmap()-ed.) Anyway, hope that is helpful! Thanks, Ryan -- Ryan Curtin | "If it's something that can be stopped, then just try to stop it!" [email protected] | - Skull Kid _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
