Well, I was trying to implement the rainforest algorithm, based on the following paper:
"RainForest - A Framework for Fast Decision Tree Construction of Large Datasets" On Sun, Aug 14, 2011 at 11:28 AM, Xiaobo Gu <[email protected]> wrote: > Can you share the idea, I'll try to understand, and would like to help > writing some code. > > Regards, > > On Sun, Aug 14, 2011 at 6:23 PM, deneche abdelhakim <[email protected]> > wrote: > > Ted gave a very good summary of the situation. I do have plans to get rid > of > > the memory limitation and already started working on a solution, but > > unfortunately I am lacking the necessary time and motivation to get it > done > > :( > > > > On Sun, Aug 14, 2011 at 11:12 AM, Xiaobo Gu <[email protected]> > wrote: > > > >> Do you have any plan to get rid of the memory limitation in Random > Forest? > >> > >> Regards, > >> > >> Xiaobo Gu > >> > >> On Thu, Jul 7, 2011 at 11:48 PM, Ted Dunning <[email protected]> > >> wrote: > >> > The summary of the reason is that this was a summer project and > >> > parallelizing the random forest algorithm at all was a big enough > >> project. > >> > > >> > Writing a single pass on-line algorithm was considered a bit much for > the > >> > project size. Figuring out how to make multiple passes through an > input > >> > split was similarly out of scope. > >> > > >> > If you have a good alternative, this would be of substantial interest > >> > because it could improve the currently limited scalability of the > >> decision > >> > forest code. > >> > > >> > On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu <[email protected]> > >> wrote: > >> > > >> >> Why can't a tree be built against a dataset resides on the disk as > >> >> long as we can read it ? > >> >> > >> > > >> > > >
