Thanks for the inputs Micah.


So it's clearer that we may need to use Bits.Java or not. If Netty is 
considered to be something optional so maybe it's more acceptable to just use 
Bits.java since Dataset module is built-in? This way we can treat all built-in 
off-heap memory allocation as direct memory allocation.


Hongze


At 2020-07-21 11:48:36, "Micah Kornfield" <emkornfi...@gmail.com> wrote:
>I don't have deep expertise here, but I think we should not choose one of
>the options with Netty.  There has been a decent amount of work to decouple
>the Arrow from any hard netty dependencies.
>
>On Mon, Jul 20, 2020 at 3:52 AM Hongze Zhang <notify...@126.com> wrote:
>
>> Hi,
>>
>> I want to discuss a bit about the discussion[1] in the pending PR[2] for
>> Java Dataset(it's no longer "Datasets" I guess?) API.
>>
>>
>> - Background:
>>
>> We are transferring C++ Arrow buffers to Java side BufferAllocators. We
>> should decide whether to use -XX:MaxDirectMemorySize as a limit of these
>> buffers. If yes, what should be a desired solution?
>>
>> - Possible alternative solutions so far:
>>
>> 1. Reserve from Bits.java from Java side
>>
>> Pros: Share memory counter with JVM direct byte buffers, No JNI overhead,
>> less codes
>> Cons: More invocations (each buffer a call to Bits#reserveMemory)
>>
>> 2. Reserve from Bits.java from C++ side
>>
>> Pros: Share memory counter with JVM direct byte buffers, Less invocations
>> (e.g. if using Jemalloc, we can somehow perform one call for one underlying
>> trunk)
>> Cons: JNI overhead, more codes
>>
>> 3. Reserve from Netty's PlatformDependent.java from Java side
>>
>> Pros: Share memory counter with Netty-based buffers, No JNI overhead, less
>> codes
>> Cons: More invocations
>>
>> 4. Reserve from Netty's PlatformDependent.java from C++ side
>>
>> Pros: Share memory counter with Netty-based buffers, Less invocations
>> Cons: JNI overhead, more codes
>>
>> 5. Not to implement any of the above, respect to BufferAllocator's limit
>> only.
>>
>>
>> So far I prefer 5, not to use any of the solutions. I am not sure if
>> "direct memory" is a good indicator for these off-heap buffers, because we
>> should finally have to decide to share counter with either JVM direct byte
>> buffers or Netty-based buffers. As far as I could think about, a complete
>> solution may ideally either to have a global counter for all types of
>> off-heap buffers, or give each type a individual counter.
>>
>> So do you have any thoughts or suggestions on this topic? It would be
>> great if we could have a conclusion soon as the PR was blocked for some
>> time. Thanks in advance :)
>>
>>
>> Best,
>> Hongze
>>
>> [1] https://github.com/apache/arrow/pull/7030#issuecomment-657096664
>> [2] https://github.com/apache/arrow/pull/7030
>>

Reply via email to