Re: since spark can not parallelize/serialize functions, how to distribute algorithms on the same data?

2016-03-28 Thread charles li
Hi, Pal, thanks a lot, this can indeed help me. On Mon, Mar 28, 2016 at 10:44 PM, Sujit Pal wrote: > Hi Charles, > > I tried this with dummied out functions which just sum transformations of > a list of integers, maybe they could be replaced by algorithms in your > case.

Re: since spark can not parallelize/serialize functions, how to distribute algorithms on the same data?

2016-03-28 Thread Sujit Pal
Hi Charles, I tried this with dummied out functions which just sum transformations of a list of integers, maybe they could be replaced by algorithms in your case. The idea is to call them through a "god" function that takes an additional type parameter and delegates out to the appropriate

Re: since spark can not parallelize/serialize functions, how to distribute algorithms on the same data?

2016-03-28 Thread Holden Karau
You probably want to look at the map transformation, and the many more defined on RDDs. The function you pass in to map is serialized and the computation is distributed. On Monday, March 28, 2016, charles li wrote: > > use case: have a dataset, and want to use different

since spark can not parallelize/serialize functions, how to distribute algorithms on the same data?

2016-03-28 Thread charles li
use case: have a dataset, and want to use different algorithms on that, and fetch the result. for making this, I think I should distribute my algorithms, and run these algorithms on the dataset at the same time, am I right? but it seems that spark can not parallelize/serialize