Re: Incremently load big RDD file into Memory

2015-04-09 Thread MUHAMMAD AAMIR
Hi, Thanks a lot for such a detailed response. On Wed, Apr 8, 2015 at 8:55 PM, Guillaume Pitel guillaume.pi...@exensa.com wrote: Hi Muhammad, There are lots of ways to do it. My company actually develops a text mining solution which embeds a very fast Approximate Neighbours solution (a

Re: Incremently load big RDD file into Memory

2015-04-08 Thread Guillaume Pitel
Hi Muhammad, There are lots of ways to do it. My company actually develops a text mining solution which embeds a very fast Approximate Neighbours solution (a demo with real time queries on the wikipedia dataset can be seen at wikinsights.org). For the record, we now prepare a dataset of 4.5

RE: Incremently load big RDD file into Memory

2015-04-07 Thread java8964
cartesian is an expensive operation. If you have 'M' records in location, then locations. cartesian(locations) will generate MxM result.If locations is a big RDD, it is hard to do the locations. cartesian(locations) efficiently.Yong Date: Tue, 7 Apr 2015 10:04:12 -0700 From: