Re: Out-of-core random forest implementation

Marty Kube Wed, 20 Feb 2013 05:26:12 -0800

Hi Lorenz,

Very interesting, that's what I was asking for when I mentioned non-MRimplementations :-)

I have not looked at spark before, interesting that it uses Mesos forclustering. I'll check it out.


On 02/19/2013 09:32 PM, Lorenz Knies wrote:

Hi Marty,

i am currently working on a PLANET-like implementation on top of spark: 
http://spark-project.org

I think this framework is a nice fit for the problem.
If the input data fits into the "total cluster memory" you benefit from the 
caching of the RDD's.

regards,

lorenz


On Feb 20, 2013, at 2:42 AM, Marty Kube <[email protected]> wrote:

You had mentioned other "resource management" platforms like Giraph or Mesos.  
I haven't looked at those yet.  I guess I was think of other parallelization frameworks.

It's interesting that the planet folks thought it was really worthwhile working 
on top of map reduce for all of the resource management that is built in.


On 02/19/2013 08:04 PM, Ted Dunning wrote:

If non-MR means map-only job with communicating mappers and a state store,
I am down with that.

What did you mean?

On Tue, Feb 19, 2013 at 5:53 PM, Marty Kube <
[email protected]> wrote:

Right now I'd lean towards the planet model, or maybe a non-MR
implementation.  Anyone have a good idea for a non-MR solution?

Re: Out-of-core random forest implementation

Reply via email to