Worth also observing that the Java community has been recognizant of the
issues with the current approach around direct byte buffer and GC, and has
come up with a proposal/implementation to avoid interaction with GC and
give more direct control. The proposal (actually, its 3rd iteration) is
described here at https://openjdk.java.net/jeps/393, and has been available
as an incubator feature for several JDK releases (Javadoc:
https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/package-summary.html
)

Laurent


On Thu, Oct 7, 2021 at 4:40 PM Jacques Nadeau <jacq...@apache.org> wrote:

> Clearly this patch was driven by an implicit set of needs but it's hard to
> guess at what they are. As Laurent asked, what is the main goal here? There
> may be many ways to solve this goal.
>
> Some thoughts in general:
> - The allocator is a finely tuned piece of multithreaded machinery that is
> used on hundreds of thousands of vm's ever year. As such, refactors should
> be avoided without a clear pressing need. When a need exists, it should be
> discussed extensively and thoroughly designed/vetted before proposing any
> patches.
> - Having used the library extensively, I can state that the library is
> already too heap impacting given the nature of the objects. Increasing GC
> load/reliance is actually in direct conflict to many of the past
> optimizations of the allocator and vector objects.
> - As far as I can tell, a new allocation manager could do auto-releases.
> Why go beyond introducing a new allocation manager (or even a backstopping
> allocation manager facade). The allocation manager was actually built to
> solve for things like memory moving between C and java.
> - It isn't clear here how much this is proposed as a backstop mechanism
> versus a primary release mechanism. I'm more in support of an optional
> backstop mechanism so that production apps don't leak (with appropriate
> logged warnings, etc).
> - Relying on GC to release direct memory frequently leads to early OOMs
> (exactly why the release mechanism was in place and is chosen for high perf
> apps like Netty and Arrow). It is fairly easy to allocate a bunch of direct
> memory but the GC never kicks in because heap memory is not under load.
>
> Lastly, I agree with Micah on the library use confusion. If we have close()
> methods everywhere and they javadoc they are required to release memory but
> then we implement something that makes that possibly not true (or not true
> in some contexts), wowser on the confusion.
>
>
>
>
> On Thu, Oct 7, 2021 at 3:55 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > In addition to the points raised by Laurent, I'm concerned about having
> two
> > possible idioms in the same library.  It means code written against the
> > library becomes less portable (you need to know how the memory allocator
> is
> > using GC or not).
> >
> > I understand manual memory management in Java is tedious but is there a
> > specific problem this is addressing other than making Arrow have more
> > expected semantics to Java users?  Are there maybe alternative solutions
> to
> > addressing the problem?
> >
> >
> > -Micah
> >
> >
> >
> > On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang <notify...@126.com> wrote:
> >
> > > We don't have to concern about that since no difference will be made on
> > > current manual release path unless "MemoryChunkCleaner" is explicitly
> > > specified when creating BufferAllocators. Manual ref counting will be
> > only
> > > discard/ignored in "MemoryChunkCleaner", while by default
> > > "MemoryChunkManager" is used which is totally the same as current.
> > >
> > >
> > >
> > >
> > > For example, to enable GC-based functionalities, the BufferAllocator
> has
> > > to be created using the following way:
> > >
> > >
> > >
> > >
> > > BufferAllocator root = new RootAllocator(
> > >
> > >         BaseAllocator.configBuilder()
> > >
> > >             .maxAllocation(MAX_ALLOCATION)
> > >
> > >             .memoryChunkManagerFactory(MemoryChunkCleaner.newFactory())
> > //
> > > to enable GC-based buffer release
> > >
> > >             .listener(MemoryChunkCleaner.gcTrigger()) // to trigger JVM
> > GC
> > > when allocator is full
> > >
> > >             .build());
> > >
> > > Thanks,
> > >
> > > Hongze
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2021-10-07 12:53:23, "Laurent Goujon" <laur...@dremio.com> wrote:
> > > >Unless I missed something in the big picture, but it seems that using
> a
> > GC
> > > >based approach would make things much inefficient in terms of memory
> > > >management and would cause probably quite the confusion for all the
> > > systems
> > > >out there which rely on refcounting for keeping things in check, I'm
> not
> > > >sure why changing the default is such a good idea...
> > > >
> > > >On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang <notify...@126.com>
> wrote:
> > > >
> > > >> Hi Laurent,
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Sorry I might describe it unclearly and yes, the purpose is
> > > >> straightforward: to discard (in another strategy rather than
> default)
> > > the
> > > >> usage of Java buffer ref counting and rely on GC to do cleanup for
> the
> > > >> buffers. Though some design changes to Allocator APIs maybe required
> > to
> > > >> achieve this, for example, the decoupling of MemoryChunkAllocator
> from
> > > >> AllocationManager.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Should we make more discussion on it, or I can open a ticket asap?
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Hongze
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> At 2021-10-05 13:39:55, "Laurent Goujon" <laur...@dremio.com>
> wrote:
> > > >> >Hi,
> > > >> >
> > > >> >I gave a quick look at your patches but to be honest, it's not easy
> > to
> > > >> >figure out the problem you're trying to solve in the first place.
> Or
> > > is it
> > > >> >simply to discard the use of ref counting to track buffer usages
> and
> > > >> >relying on Java references to keep track of buffers at the expense
> of
> > > >> >creating more pressure on the GC itself to collect and free
> buffers?
> > > >> >
> > > >> >On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang <notify...@126.com>
> > > wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> I would like to discuss on the potential of introducing a
> GC-based
> > > >> >> reference management strategy to Arrow Java, and we
> > > >> >> have already been working on an implementation in our own
> project.
> > I
> > > >> have
> > > >> >> put the related codes in following branch and
> > > >> >> if it makes sense to upstream Apache Arrow I can open a PR for
> it:
> > > >> >>
> > > >> >>
> > > >> >>
> https://github.com/zhztheplayer/arrow-1/commits/wip-chunk-cleaner
> > > >> >>
> > > >> >> In the branch, regarding Commit 1, Commit 2 and Commit 3:
> > > >> >>
> > > >> >> Commit 1: To break AllocationManager to two components:
> > > >> >> MemoryChunkManager[1] which maintains BufferLedger-
> > > >> >> BufferAllocator mappings, and MemoryChunkAllocator[2] which
> > performs
> > > the
> > > >> >> underlying allocate/destory operations. The
> > > >> >> previous customizations such as NettyAllocationManager,
> > > >> >> UnsafeAllocationManager, are moved to MemoryChunkAllocator API,
> > > >> >> as NettyMemoryChunkAllocator and UnsafeMemoryChunkAllocator.
> > > >> >>
> > > >> >> Commit 2: To introduce a new implementation of
> MemoryChunkManager,
> > > >> >> MemoryChunkCleaner[3]. By default, MemoryChunkCleaner
> > > >> >> converts all managed chunks to WeakReferences, and a global
> thread
> > > will
> > > >> >> observe on the assigned reference queue to
> > > >> >> release the unused garbage chunks which are enqueued by JVM GC.
> > Some
> > > >> >> modes[4] are there to provide different strategies,
> > > >> >> such as to disable manual reference management (default), or to
> > > enable
> > > >> >> manual and GC-based reference management at the
> > > >> >> same time, or to report leaks only.
> > > >> >>
> > > >> >> Commit 3: Add API "ArrowBuf buffer(MemoryChunk chunk)" to
> > > >> BufferAllocator.
> > > >> >> This makes some special workloads, e.g.
> > > >> >> cross-JNI buffer sharing much easier as we will no longer need an
> > > >> >> Allocator directly binding to the shared memory
> > > >> >> chunks, and will still be able to leverage all of Allocator's
> > > advantages
> > > >> >> to manage the chunks (E.g. to use
> > > >> >> MemoryChunkCleaner).
> > > >> >>
> > > >> >> Some possible Q&As about this implementation:
> > > >> >>
> > > >> >> 1. Why adding MemoryChunkAllocator, rather than just customizing
> > > >> >> AllocationManager?
> > > >> >> It is to reuse all of current underlying chunk allocation
> > strategies,
> > > >> >> Netty, Unsafe, and others. Also, with the layer of
> > > >> >> MemoryChunk separated from AllocationManager we will be able to
> > > create
> > > >> >> MemoryChunks actively in some special cases, e.g.
> > > >> >> cross-JNI buffer sharing, C data interface buffer importing, then
> > > >> populate
> > > >> >> the chunks to a manager BufferAllocator.
> > > >> >>
> > > >> >> 2. Why using WeakReference rather than PhantomReference?
> > > >> >> In Java 8, PhantomReference has some issue against its referent
> > > object.
> > > >> >> The object cannot be garbage collected before
> > > >> >> being enqueued. In WeakReference we don't have this issue. See
> > > ref[5].
> > > >> >>
> > > >> >> 3. Why not sun.misc.Cleaner?
> > > >> >> Cleaner API maintains a global doubly linked list to keep the
> > Cleaner
> > > >> >> instances alive. This brings overheads to us since
> > > >> >> we will create local doubly linked list for cleaning up all the
> > > buffers
> > > >> on
> > > >> >> Allocator close. See a unit test[6].
> > > >> >>
> > > >> >> 4. Why closing all buffers on Allocator close?
> > > >> >> This behavior can be customizabale within Mode[7]s. A principle
> is,
> > > when
> > > >> >> relying on GC, we should allow closing buffers
> > > >> >> manually, or at least closing all of them on Allocator close. Or
> > the
> > > >> >> actual release time of the underlying chunks will
> > > >> >> be unpredictable.
> > > >> >>
> > > >> >> 5. Can we use heap based buffers?
> > > >> >> If I am not wrong, no. The heap objects can be physically moved
> > > around
> > > >> by
> > > >> >> JVM. The addresses can vary.
> > > >> >>
> > > >> >> 6. When should GC happen?
> > > >> >> A new AllocationListener implementation, GCTriger[8] is
> introduced.
> > > GC
> > > >> >> will be performed when BufferAllocator is full.
> > > >> >>
> > > >> >> 7. Performance?
> > > >> >> Based on previous measurement, it doesn't bring overheads on the
> > > legacy
> > > >> >> path (by using default MemoryChunkManager). For
> > > >> >> GC-based path (MemoryChunkCleaner + Mode.GC_ONLY), allocations
> can
> > be
> > > >> >> comparatively slower due to higher GC pressure
> > > >> >> (For example, in out Arrow-based query engine, to run TPC-DS
> > SF1000,
> > > the
> > > >> >> overhead can be up to 3%). I can collect more
> > > >> >> benchmark results in future, maybe under a JIRA ticket or a PR.
> > > >> >>
> > > >> >> 8. Breaking changes?
> > > >> >> It doesn't break library-based allocation strategy selection like
> > > >> directly
> > > >> >> including arrow-memory-unsafe.jar to
> > > >> >> projects. However it breaks use of the previous
> > > >> AllocationManager.Factory
> > > >> >> API. Users should migrate to
> > > >> >> MemoryChunkAllocator API. Which in my opinion is simple to do.
> > > >> >>
> > > >> >> The implementation is still evolving, so if the links get
> > out-of-date
> > > >> you
> > > >> >> can check on the newest branch head. Any
> > > >> >> suggestions welcome.
> > > >> >>
> > > >> >> Some other discussions that may be related to this topic:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/r09b2e7dd8e342818190c332ee20d97f68816241d0c683e17c4f35dc6%40%3Cdev.arrow.apache.org%3E
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/rc956b16b7ed9768cce8eaecafdc363f7b3808dafe6edf1940b139ecb%40%3Cuser.arrow.apache.org%3E
> > > >> >>
> > > >> >> Best,
> > > >> >> Hongze
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> [1]
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/
> > > >> >> MemoryChunkManager.java
> > > >> >> [2]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkAllocator.java
> > > >> >>
> > > >> >> [3]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java
> > > >> >> [4]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > >> >> [5]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://stackoverflow.com/questions/56684150/why-since-java-9-phantomreference-java-doc-states-that-it-is-dedicated-to-the-po
> > > >> >> [6]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/test/java/org/apache/arrow/memory/TestMemoryChunkCleaner.java#L204
> > > >> >> [7]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L503
> > > >> >> [8]
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/zhztheplayer/arrow-1/blob/52e5def7a239164b5c13457c8be38e227bb2907d/java/memory/memory-core/src/main/java/org/apache/arrow/memory/MemoryChunkCleaner.java#L402
> > > >> >>
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
>

Reply via email to