off-heap RDDs

Imran Rashid Sun, 25 Aug 2013 15:26:42 -0700

Hi,

I was wondering if anyone has thought about putting cached data in an
RDD into off-heap memory, eg. w/ direct byte buffers.  For really
long-lived RDDs that use a lot of memory, this seems like a huge
improvement, since all the memory is now totally ignored during GC.
(and reading data from direct byte buffers is potentially faster as
well, buts thats just a nice bonus).


The easiest thing to do is to store memory-serialized RDDs in direct
byte buffers, but I guess we could also store the serialized RDD on
disk and use a memory mapped file.  Serializing into off-heap buffers
is a really simple patch, I just changed a few lines (I haven't done
any real tests w/ it yet, though).  But I dont' really have a ton of
experience w/ off-heap memory, so I thought I would ask what others
think of the idea, if it makes sense or if there are any gotchas I
should be aware of, etc.

thanks,
Imran

off-heap RDDs

Reply via email to