On Mon, Apr 28, 2014 at 8:22 AM, Sung Hwan Chung
coded...@cs.stanford.eduwrote:
e.g. something like
rdd.mapPartition((rows : Iterator[String]) = {
var idx = 0
rows.map((row: String) = {
val valueMap = SparkWorker.getMemoryContent(valMap)
val prevVal = valueMap(idx)
idx +=
Actually, I do not know how to do something like this or whether this is
possible - thus my suggestive statement.
Can you already declare persistent memory objects per worker? I tried
something like constructing a singleton object within map functions, but
that didn't work as it seemed to
A mutable map in an object should do what your looking for then I believe.
You just reference the object as an object in your closure so it won't be
swept up when your closure is serialized and you can reference variables of
the object on the remote host then. e.g.:
object MyObject {
val mmap =
As to your last line: I've used RDD zipping to avoid GC since MyBaseData is
large and doesn't change. I think this is a very good solution to what is
being asked for.
On Mon, Apr 28, 2014 at 10:44 AM, Ian O'Connell i...@ianoconnell.com wrote:
A mutable map in an object should do what your
I'm not sure what I said came through. RDD zip is not hacky at all, as it
only depends on a user not changing the partitioning. Basically, you would
keep your losses as an RDD[Double] and zip whose with the RDD of examples,
and update the losses. You're doing a copy (and GC) on the RDD of
Right---They are zipped at each iteration.
On Mon, Apr 28, 2014 at 11:56 AM, Chester Chen chesterxgc...@yahoo.comwrote:
Tom,
Are you suggesting two RDDs, one with loss and another for the rest
info, using zip to tie them together, but do update on loss RDD (copy) ?
Chester
Sent from
Ian, I tried playing with your suggestion, but I get a task not
serializable error (and some obvious things didn't fix it). Can you get
that working?
On Mon, Apr 28, 2014 at 10:58 AM, Tom Vacek minnesota...@gmail.com wrote:
As to your last line: I've used RDD zipping to avoid GC since
That might be a good alternative to what we are looking for. But I wonder
if this would be as efficient as we want to. For instance, will RDDs of the
same size usually get partitioned to the same machines - thus not
triggering any cross machine aligning, etc. We'll explore it, but I would
still
If you create your auxiliary RDD as a map from the examples, the
partitioning will be inherited.
On Mon, Apr 28, 2014 at 12:38 PM, Sung Hwan Chung
coded...@cs.stanford.eduwrote:
That might be a good alternative to what we are looking for. But I wonder
if this would be as efficient as we want
Hi Todd,
As Patrick and you already pointed out, it's really dangerous to mutate the
status of RDD. However, when we implement the glmnet in Spark, if we can
reuse the residuals for each row in RDD computed from the previous step, it
can speed up 4~5x.
As a result, we add extra column in RDD for
10 matches
Mail list logo