Re: Out-of-core random forest implementation

Andy Twigg Wed, 20 Feb 2013 06:28:47 -0800

Why don't we start from

https://github.com/ashenfad/hadooptree ?


On 20 February 2013 13:25, Marty Kube <[email protected]> wrote:
> Hi Lorenz,
>
> Very interesting, that's what I was asking for when I mentioned non-MR
> implementations :-)
>
> I have not looked at spark before, interesting that it uses Mesos for
> clustering.   I'll check it out.
>
>
> On 02/19/2013 09:32 PM, Lorenz Knies wrote:
>>
>> Hi Marty,
>>
>> i am currently working on a PLANET-like implementation on top of spark:
>> http://spark-project.org
>>
>> I think this framework is a nice fit for the problem.
>> If the input data fits into the "total cluster memory" you benefit from
>> the caching of the RDD's.
>>
>> regards,
>>
>> lorenz
>>
>>
>> On Feb 20, 2013, at 2:42 AM, Marty Kube <[email protected]>
>> wrote:
>>
>>> You had mentioned other "resource management" platforms like Giraph or
>>> Mesos.  I haven't looked at those yet.  I guess I was think of other
>>> parallelization frameworks.
>>>
>>> It's interesting that the planet folks thought it was really worthwhile
>>> working on top of map reduce for all of the resource management that is
>>> built in.
>>>
>>>
>>> On 02/19/2013 08:04 PM, Ted Dunning wrote:
>>>>
>>>> If non-MR means map-only job with communicating mappers and a state
>>>> store,
>>>> I am down with that.
>>>>
>>>> What did you mean?
>>>>
>>>> On Tue, Feb 19, 2013 at 5:53 PM, Marty Kube <
>>>> [email protected]> wrote:
>>>>
>>>>> Right now I'd lean towards the planet model, or maybe a non-MR
>>>>> implementation.  Anyone have a good idea for a non-MR solution?
>>>>>
>



--
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
[email protected] | +447799647538

Re: Out-of-core random forest implementation

Reply via email to