1. For executor memory, we have spark.executor.memory for heap size, and 
spark.memory.offHeap.size for off-heap size, and these 2 together is the total 
memory consumption for each executor process.
From the user side, what they always care is the total memory consumption, no 
matter it is on-heap or off-heap. It seems that it is more friendly to have 
only one memory config for the user.
Can we merge the two configs to be one, and hide the complexity within internal 
system?
2. spark.memory.offHeap.size is originally designed for MemoryManager, which is 
to manage off-heap memory explicitly allocated by Spark itself when creating 
its own buffers / pages or caching blocks, not to account for off-heap memory 
used by lower-level code or third-party libraries, for example Netty. But 
spark.memory.offHeap.size and spark.memory.offHeap.enable are more or less 
confusing. Sometimes user can ask – "I've already set 
spark.memory.offHeap.enable to be false, but why Netty is reading remote blocks 
to off-heap?". Also I think we need to document more about
spark.memory.offHeap.size and spark.memory.offHeap.enable on 
http://spark.apache.org/docs/latest/configuration.html

Reply via email to