On Friday, 14 April 2017 at 17:55:54 UTC, jmh530 wrote:
On Thursday, 13 April 2017 at 11:23:32 UTC, jmh530 wrote:

Just an FYI, I was looking at another post

http://www.active-analytics.com/blog/fitting-glm-with-large-datasets/

and the top part is a little confusing because the code below switches it up to do CC=BB*AA instead of CC=AA*BB.

If I'm understanding it correctly, you originally have an mXn matrix times an nXp matrix, then you partition the left hand side to be mXk and the right hand to kXp and loop through and add them up. However, at the top you say that A (which at the top is the left hand variable) is split up by rows. However, the code clearly splits the left hand side (B here) by columns (BB is 5X100 and B is a 10-dimensional list of 5X10 matrices).

Sorry, I didn't see your question until now. That article was something I worked on years earlier. The main principle is that you split and aggregate over repeated indices. The code is intended to be illustrative of the principle. Don't get too hung up with equating the the code symbols with equation - the principle is the main thing. I wrote an R package where the important bits is written in C++: https://cran.r-project.org/web/packages/bigReg/index.html using the principle in GLM

MORE IMPORTANTLY, however is that that algorithm is not efficient! At least not as efficient as gradient descent or even better stochastic gradient descent or their respective modifications.


Reply via email to