Yeah I remember know that spark uses unsafe. I haven't noticed how much arrow and if arrow is just for py-spark .. but that is another option.
Do you agree this makes sense then, to look into?Trevor? I think there is an easy optimization for one of the next releases. Maybe 14.3 or 14.4/15.0. Maybe we can finding a GSocC student who would be interesting this. It's a short topic that someone could focus on several different areas, and learn alot in one summer. I think the most commonly useful interview questions are about native memory or jni or device memory. I feel you can find out alot in one question. On Fri, 8 Nov 2019 08:38:22 -0600, dev wrote: I was at a Meetup last night that was talking about how Spark does this natively in some cases with sun.misc.unsafe (which is being removed in Java 11) and how Flink does this with DirectByteBuffer (I think?) which has numerous benefits (that's what the meetup talk was about). On Sun, Nov 3, 2019 at 8:19 PM Andrew Palumbo (via Google Docs) < andrewpalumbo2...@gmail.com> wrote: I've shared an item with you: Proposal for a New Serialization, De-serialization and Memory Scheme for [Spark] distribution of Mahout AbstractVectors and Mahout Abstract Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices https://docs.google.com/document/d/18RybVEpjqjDU_cCzwM6dtS3ZZkd1tDvCYlkhqzloS-4/edit?usp=sharing&ts=5dbf8960 It's not an attachment -- it's stored online. To open this item, just click the link above.