Hi, I was wondering if anyone has thought about putting cached data in an RDD into off-heap memory, eg. w/ direct byte buffers. For really long-lived RDDs that use a lot of memory, this seems like a huge improvement, since all the memory is now totally ignored during GC. (and reading data from direct byte buffers is potentially faster as well, buts thats just a nice bonus).
The easiest thing to do is to store memory-serialized RDDs in direct byte buffers, but I guess we could also store the serialized RDD on disk and use a memory mapped file. Serializing into off-heap buffers is a really simple patch, I just changed a few lines (I haven't done any real tests w/ it yet, though). But I dont' really have a ton of experience w/ off-heap memory, so I thought I would ask what others think of the idea, if it makes sense or if there are any gotchas I should be aware of, etc. thanks, Imran
