lidavidm commented on a change in pull request #12603: URL: https://github.com/apache/arrow/pull/12603#discussion_r828399202
########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,208 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `ArrowBuf`_ +* `BufferAllocator`_ +* `Reference counting`_ + +.. contents:: + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``memory-core``: Provides the interfaces used by the Arrow libraries and applications. +* ``memory-netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``memory-unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous region of `direct memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +Unlike (Direct)ByteBuffer, it has reference counting built in, as discussed later. + +Why Arrow Uses Direct Memory +---------------------------- + +* The JVM can optimize I/O operations when using direct memory/direct buffers; it will attempt to avoid copying buffer contents to/from an intermediate buffer. This can speed up IPC in Arrow. +* Since Arrow always uses direct memory, JNI modules can directly wrap native memory addresses instead of copying data. We use this in modules like the C Data Interface. +* Conversely, on the C++ side of the JNI boundary, we can directly access the memory in ArrowBuf without copying data. + +BufferAllocator +=============== + +The `BufferAllocator`_ interface deals with allocating ArrowBufs for the application. + +.. code-block:: Java + + import org.apache.arrow.memory.ArrowBuf; + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + try(BufferAllocator bufferAllocator = new RootAllocator(8 * 1024)){ + ArrowBuf arrowBuf = bufferAllocator.buffer(4 * 1024); + System.out.println(arrowBuf); + arrowBuf.close(); + } + +.. code-block:: + + ArrowBuf[2], address:140363641651200, length:4096 + +The concrete implementation of the BufferAllocator interface is `RootAllocator`_. Applications should generally create +one RootAllocator at the start of the program, and use it through the BufferAllocator interface. Allocators implement +AutoCloseable and must be closed after the application is done with them; this will check that all outstanding memory +has been freed (see the next section). + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then more allocators +are created as children of an existing allocator via `newChildAllocator`_. When creating a RootAllocator or a child +allocator, a memory limit is provided, and when allocating memory, the limit is checked. Furthermore, when allocating +memory from a child allocator, those allocations are also reflected in all parent allocators. Hence, the RootAllocator +effectively sets the program-wide memory limit, and serves as the master bookkeeper for all memory allocations. + +Child allocators are not strictly required, but can help better organize code. For instance, a lower memory limit can +be set for a particular section of code. When the allocator is closed, it then checks that that section didn't leak any +memory. And child allocators can be named, which makes it easier to tell where an ArrowBuf came from during debugging. + +Reference counting +================== + +Direct memory is more expensive to allocate and deallocate. That's why allocators pool or cache direct buffers. + +Because we want to pool/cache buffers and manage them deterministically, we use manual reference counting instead of +the garbage collector. This simply means that each buffer has a counter keeping track of the number of references to +the buffer, and the user is responsible for properly incrementing/decrementing the counter as the buffer is used. + +In Arrow, each ArrowBuf has an associated `ReferenceManager`_ that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and +``ReferenceManager.retain``. + +Of course, this is tedious and error-prone, so usually, instead of directly working with buffers, we should use +higher-level APIs like ValueVector. Such classes generally implement Closeable/AutoCloseable and will automatically +decrement the reference count when closed. + +Allocators implement AutoCloseable as well. In this case, closing the allocator will check that all buffers +obtained from the allocator are closed. If not, ``close()`` method will raise an exception; this helps track +memory leaks from unclosed buffers. + +As you see, reference counting needs to be handled carefully. To ensure that an +independent section of code has fully cleaned up all allocated buffers, use a new child allocator. + +Development Guidelines +====================== + +Applications should generally: + +* Use the BufferAllocator interface in APIs instead of RootAllocator. +* Create one RootAllocator at the start of the program. +* ``close()`` allocators after use (whether they are child allocators or the RootAllocator), either manually or preferably via a try-with-resources statement. + +Debugging Memory Leaks/Allocation +================================= + +Allocators have a debug mode that makes it easier to figure out where a leak is originated. +To enable it, enable assertions with ``-ea`` or set the system property, ``-Darrow.memory.debug.allocator=true``. +When enabled, a log will be kept of allocations. + +Arrow modules define simple logging facade for java SLF4J, configure it properly to see your logs (e.g. Logback/Log4J). Review comment: We don't define slf4j. ########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,208 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `ArrowBuf`_ +* `BufferAllocator`_ +* `Reference counting`_ + +.. contents:: + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``memory-core``: Provides the interfaces used by the Arrow libraries and applications. +* ``memory-netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``memory-unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous region of `direct memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +Unlike (Direct)ByteBuffer, it has reference counting built in, as discussed later. + +Why Arrow Uses Direct Memory +---------------------------- + +* The JVM can optimize I/O operations when using direct memory/direct buffers; it will attempt to avoid copying buffer contents to/from an intermediate buffer. This can speed up IPC in Arrow. +* Since Arrow always uses direct memory, JNI modules can directly wrap native memory addresses instead of copying data. We use this in modules like the C Data Interface. +* Conversely, on the C++ side of the JNI boundary, we can directly access the memory in ArrowBuf without copying data. + +BufferAllocator +=============== + +The `BufferAllocator`_ interface deals with allocating ArrowBufs for the application. + +.. code-block:: Java + + import org.apache.arrow.memory.ArrowBuf; + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + try(BufferAllocator bufferAllocator = new RootAllocator(8 * 1024)){ + ArrowBuf arrowBuf = bufferAllocator.buffer(4 * 1024); + System.out.println(arrowBuf); + arrowBuf.close(); + } + +.. code-block:: + + ArrowBuf[2], address:140363641651200, length:4096 + +The concrete implementation of the BufferAllocator interface is `RootAllocator`_. Applications should generally create +one RootAllocator at the start of the program, and use it through the BufferAllocator interface. Allocators implement +AutoCloseable and must be closed after the application is done with them; this will check that all outstanding memory +has been freed (see the next section). + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then more allocators +are created as children of an existing allocator via `newChildAllocator`_. When creating a RootAllocator or a child +allocator, a memory limit is provided, and when allocating memory, the limit is checked. Furthermore, when allocating +memory from a child allocator, those allocations are also reflected in all parent allocators. Hence, the RootAllocator +effectively sets the program-wide memory limit, and serves as the master bookkeeper for all memory allocations. + +Child allocators are not strictly required, but can help better organize code. For instance, a lower memory limit can +be set for a particular section of code. When the allocator is closed, it then checks that that section didn't leak any +memory. And child allocators can be named, which makes it easier to tell where an ArrowBuf came from during debugging. + +Reference counting +================== + +Direct memory is more expensive to allocate and deallocate. That's why allocators pool or cache direct buffers. + +Because we want to pool/cache buffers and manage them deterministically, we use manual reference counting instead of +the garbage collector. This simply means that each buffer has a counter keeping track of the number of references to +the buffer, and the user is responsible for properly incrementing/decrementing the counter as the buffer is used. + +In Arrow, each ArrowBuf has an associated `ReferenceManager`_ that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and +``ReferenceManager.retain``. + +Of course, this is tedious and error-prone, so usually, instead of directly working with buffers, we should use +higher-level APIs like ValueVector. Such classes generally implement Closeable/AutoCloseable and will automatically +decrement the reference count when closed. + +Allocators implement AutoCloseable as well. In this case, closing the allocator will check that all buffers +obtained from the allocator are closed. If not, ``close()`` method will raise an exception; this helps track +memory leaks from unclosed buffers. + +As you see, reference counting needs to be handled carefully. To ensure that an +independent section of code has fully cleaned up all allocated buffers, use a new child allocator. + +Development Guidelines +====================== + +Applications should generally: + +* Use the BufferAllocator interface in APIs instead of RootAllocator. +* Create one RootAllocator at the start of the program. +* ``close()`` allocators after use (whether they are child allocators or the RootAllocator), either manually or preferably via a try-with-resources statement. + +Debugging Memory Leaks/Allocation +================================= + +Allocators have a debug mode that makes it easier to figure out where a leak is originated. +To enable it, enable assertions with ``-ea`` or set the system property, ``-Darrow.memory.debug.allocator=true``. +When enabled, a log will be kept of allocations. + +Arrow modules define simple logging facade for java SLF4J, configure it properly to see your logs (e.g. Logback/Log4J). Review comment: ```suggestion Arrow logs some allocation information via SLF4J; configure it properly to see these logs (e.g. via Logback/Apache Log4j). ``` ########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,208 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `ArrowBuf`_ +* `BufferAllocator`_ +* `Reference counting`_ + +.. contents:: + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``memory-core``: Provides the interfaces used by the Arrow libraries and applications. +* ``memory-netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``memory-unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous region of `direct memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +Unlike (Direct)ByteBuffer, it has reference counting built in, as discussed later. + +Why Arrow Uses Direct Memory +---------------------------- + +* The JVM can optimize I/O operations when using direct memory/direct buffers; it will attempt to avoid copying buffer contents to/from an intermediate buffer. This can speed up IPC in Arrow. +* Since Arrow always uses direct memory, JNI modules can directly wrap native memory addresses instead of copying data. We use this in modules like the C Data Interface. +* Conversely, on the C++ side of the JNI boundary, we can directly access the memory in ArrowBuf without copying data. + +BufferAllocator +=============== + +The `BufferAllocator`_ interface deals with allocating ArrowBufs for the application. + +.. code-block:: Java + + import org.apache.arrow.memory.ArrowBuf; + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + try(BufferAllocator bufferAllocator = new RootAllocator(8 * 1024)){ + ArrowBuf arrowBuf = bufferAllocator.buffer(4 * 1024); + System.out.println(arrowBuf); + arrowBuf.close(); + } + +.. code-block:: + + ArrowBuf[2], address:140363641651200, length:4096 + +The concrete implementation of the BufferAllocator interface is `RootAllocator`_. Applications should generally create +one RootAllocator at the start of the program, and use it through the BufferAllocator interface. Allocators implement +AutoCloseable and must be closed after the application is done with them; this will check that all outstanding memory +has been freed (see the next section). + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then more allocators +are created as children of an existing allocator via `newChildAllocator`_. When creating a RootAllocator or a child +allocator, a memory limit is provided, and when allocating memory, the limit is checked. Furthermore, when allocating +memory from a child allocator, those allocations are also reflected in all parent allocators. Hence, the RootAllocator +effectively sets the program-wide memory limit, and serves as the master bookkeeper for all memory allocations. + +Child allocators are not strictly required, but can help better organize code. For instance, a lower memory limit can +be set for a particular section of code. When the allocator is closed, it then checks that that section didn't leak any +memory. And child allocators can be named, which makes it easier to tell where an ArrowBuf came from during debugging. + +Reference counting +================== + +Direct memory is more expensive to allocate and deallocate. That's why allocators pool or cache direct buffers. + +Because we want to pool/cache buffers and manage them deterministically, we use manual reference counting instead of +the garbage collector. This simply means that each buffer has a counter keeping track of the number of references to +the buffer, and the user is responsible for properly incrementing/decrementing the counter as the buffer is used. + +In Arrow, each ArrowBuf has an associated `ReferenceManager`_ that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and +``ReferenceManager.retain``. + +Of course, this is tedious and error-prone, so usually, instead of directly working with buffers, we should use +higher-level APIs like ValueVector. Such classes generally implement Closeable/AutoCloseable and will automatically +decrement the reference count when closed. + +Allocators implement AutoCloseable as well. In this case, closing the allocator will check that all buffers +obtained from the allocator are closed. If not, ``close()`` method will raise an exception; this helps track +memory leaks from unclosed buffers. + +As you see, reference counting needs to be handled carefully. To ensure that an +independent section of code has fully cleaned up all allocated buffers, use a new child allocator. + +Development Guidelines +====================== + +Applications should generally: + +* Use the BufferAllocator interface in APIs instead of RootAllocator. +* Create one RootAllocator at the start of the program. +* ``close()`` allocators after use (whether they are child allocators or the RootAllocator), either manually or preferably via a try-with-resources statement. + +Debugging Memory Leaks/Allocation +================================= + +Allocators have a debug mode that makes it easier to figure out where a leak is originated. +To enable it, enable assertions with ``-ea`` or set the system property, ``-Darrow.memory.debug.allocator=true``. +When enabled, a log will be kept of allocations. + +Arrow modules define simple logging facade for java SLF4J, configure it properly to see your logs (e.g. Logback/Log4J). + +Consider the following example to see how debug enabled help us with the tracking of allocators: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try (BufferAllocator bufferAllocator = new RootAllocator(Integer.MAX_VALUE)) { + final int QUANTITY = 5; + try (IntVector intVector = new IntVector("int-01", bufferAllocator)) { + intVector.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVector.set(i, i); + } + intVector.setValueCount(QUANTITY); + } + // Fix the next code!, it is only to see the track of allocators when debug is enabled + IntVector intVectorV = new IntVector("int-02", bufferAllocator); + intVectorV.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVectorV.set(i, i); + } + intVectorV.setValueCount(QUANTITY); + + BufferAllocator childAllocator = bufferAllocator.newChildAllocator("child-isolated", 0, + Integer.MAX_VALUE / 4); + IntVector intVectorV2 = new IntVector("int-isolated-01", childAllocator); + intVectorV2.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVectorV2.set(i, i); + } + } Review comment: We should only use ArrowBuf here, don't use IntVector or anything as they haven't been introduced yet. Also, it's still not clear to me what the debug mode is doing here. From what I recall, the information it provides is basically only available in a debugger. If there is something the debug mode is doing here, we need to point it out explicitly. Ideally we would compare the information available with and without debug mode. ########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,208 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `ArrowBuf`_ +* `BufferAllocator`_ +* `Reference counting`_ + +.. contents:: + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``memory-core``: Provides the interfaces used by the Arrow libraries and applications. +* ``memory-netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``memory-unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous region of `direct memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +Unlike (Direct)ByteBuffer, it has reference counting built in, as discussed later. + +Why Arrow Uses Direct Memory +---------------------------- + +* The JVM can optimize I/O operations when using direct memory/direct buffers; it will attempt to avoid copying buffer contents to/from an intermediate buffer. This can speed up IPC in Arrow. +* Since Arrow always uses direct memory, JNI modules can directly wrap native memory addresses instead of copying data. We use this in modules like the C Data Interface. +* Conversely, on the C++ side of the JNI boundary, we can directly access the memory in ArrowBuf without copying data. + +BufferAllocator +=============== + +The `BufferAllocator`_ interface deals with allocating ArrowBufs for the application. + +.. code-block:: Java + + import org.apache.arrow.memory.ArrowBuf; + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + try(BufferAllocator bufferAllocator = new RootAllocator(8 * 1024)){ + ArrowBuf arrowBuf = bufferAllocator.buffer(4 * 1024); + System.out.println(arrowBuf); + arrowBuf.close(); + } + +.. code-block:: + + ArrowBuf[2], address:140363641651200, length:4096 + +The concrete implementation of the BufferAllocator interface is `RootAllocator`_. Applications should generally create +one RootAllocator at the start of the program, and use it through the BufferAllocator interface. Allocators implement +AutoCloseable and must be closed after the application is done with them; this will check that all outstanding memory +has been freed (see the next section). + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then more allocators +are created as children of an existing allocator via `newChildAllocator`_. When creating a RootAllocator or a child +allocator, a memory limit is provided, and when allocating memory, the limit is checked. Furthermore, when allocating +memory from a child allocator, those allocations are also reflected in all parent allocators. Hence, the RootAllocator +effectively sets the program-wide memory limit, and serves as the master bookkeeper for all memory allocations. + +Child allocators are not strictly required, but can help better organize code. For instance, a lower memory limit can +be set for a particular section of code. When the allocator is closed, it then checks that that section didn't leak any +memory. And child allocators can be named, which makes it easier to tell where an ArrowBuf came from during debugging. + +Reference counting +================== + +Direct memory is more expensive to allocate and deallocate. That's why allocators pool or cache direct buffers. + +Because we want to pool/cache buffers and manage them deterministically, we use manual reference counting instead of +the garbage collector. This simply means that each buffer has a counter keeping track of the number of references to +the buffer, and the user is responsible for properly incrementing/decrementing the counter as the buffer is used. + +In Arrow, each ArrowBuf has an associated `ReferenceManager`_ that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and +``ReferenceManager.retain``. + +Of course, this is tedious and error-prone, so usually, instead of directly working with buffers, we should use +higher-level APIs like ValueVector. Such classes generally implement Closeable/AutoCloseable and will automatically +decrement the reference count when closed. + +Allocators implement AutoCloseable as well. In this case, closing the allocator will check that all buffers +obtained from the allocator are closed. If not, ``close()`` method will raise an exception; this helps track +memory leaks from unclosed buffers. + +As you see, reference counting needs to be handled carefully. To ensure that an +independent section of code has fully cleaned up all allocated buffers, use a new child allocator. + +Development Guidelines +====================== + +Applications should generally: + +* Use the BufferAllocator interface in APIs instead of RootAllocator. +* Create one RootAllocator at the start of the program. +* ``close()`` allocators after use (whether they are child allocators or the RootAllocator), either manually or preferably via a try-with-resources statement. + +Debugging Memory Leaks/Allocation +================================= + +Allocators have a debug mode that makes it easier to figure out where a leak is originated. +To enable it, enable assertions with ``-ea`` or set the system property, ``-Darrow.memory.debug.allocator=true``. +When enabled, a log will be kept of allocations. + +Arrow modules define simple logging facade for java SLF4J, configure it properly to see your logs (e.g. Logback/Log4J). + +Consider the following example to see how debug enabled help us with the tracking of allocators: + +.. code-block:: Java + + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.IntVector; + + try (BufferAllocator bufferAllocator = new RootAllocator(Integer.MAX_VALUE)) { + final int QUANTITY = 5; + try (IntVector intVector = new IntVector("int-01", bufferAllocator)) { + intVector.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVector.set(i, i); + } + intVector.setValueCount(QUANTITY); + } + // Fix the next code!, it is only to see the track of allocators when debug is enabled + IntVector intVectorV = new IntVector("int-02", bufferAllocator); + intVectorV.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVectorV.set(i, i); + } + intVectorV.setValueCount(QUANTITY); + + BufferAllocator childAllocator = bufferAllocator.newChildAllocator("child-isolated", 0, + Integer.MAX_VALUE / 4); + IntVector intVectorV2 = new IntVector("int-isolated-01", childAllocator); + intVectorV2.allocateNew(QUANTITY); + for (int i = 0; i < QUANTITY; i++) { + intVectorV2.set(i, i); + } + } + +.. code-block:: + + 15:49:32,755 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback-test.xml] at [file:/Users/java/source/demo/target/classes/logback-test.xml] + 15:49:32,924 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.apache.arrow] to DEBUG + 11:56:48.944 [main] INFO o.apache.arrow.memory.BaseAllocator - Debug mode enabled. + Exception in thread "main" java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding child allocators. + Allocator(ROOT) 0/64/64/2147483647 (res/actual/peak/limit) + child allocators: 1 + Allocator(child-isolated) 0/32/32/536870911 (res/actual/peak/limit) + child allocators: 0 + ledgers: 1 + ledger[3] allocator: child-isolated), isOwning: , size: , references: 2, life: 246918908438818..0, allocatorManager: [, life: ] holds 3 buffers. + ArrowBuf[10], address:140408097079352, length:8 + ArrowBuf[8], address:140408097079328, length:32 + ArrowBuf[9], address:140408097079328, length:24 + reservations: 0 + ledgers: 1 + ledger[2] allocator: ROOT), isOwning: , size: , references: 2, life: 246tors can be named; this makes it easier to tell where an Arro918906331643..0, allocatorManager: [, life: ] holds 3 buffers. Review comment: Seems something got copy-pasted into the middle here? ########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,174 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +.. contents:: + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `BufferAllocator`_ +* `ArrowBuf`_ +* `Reference counting`_ + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow offers a high level of abstraction providing several access APIs to read/write data into a direct memory. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``Memory Core``: Provides the interfaces used by the Arrow libraries and applications. +* ``Memory Netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``Memory Unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +BufferAllocator +=============== + +The BufferAllocator interface deals with allocating ArrowBufs for the application. + +The concrete implementation of the allocator is RootAllocator. Applications should generally create one RootAllocator at the +start of the program, and use it through the BufferAllocator interface. Allocators have a memory limit. The RootAllocator +sets the program-wide memory limit. The RootAllocator is responsible for being the master bookkeeper for memory allocations. + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then all allocators +are created as children ``BufferAllocator.newChildAllocator`` of that allocator. + +One of the uses of child allocators is to set a lower temporary limit for one section of the code. Also, child +allocators can be named; this makes it easier to tell where an ArrowBuf came from during debugging. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous allocation of `Direct Memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +The objects created using ``Direct Memory`` take advantage of native executions and it is decided natively by the JVM. Arrow +offer efficient memory operations base on this Direct Memory implementation (`see section below for detailed reasons of use`). + +Unlike (Direct)ByteBuffer, it has reference counting built in (`see the next section`). + +Reference counting +================== + +Direct memory involve more activities than allocate and deallocate because allocators (thru pool/cache) +allocate buffers (ArrowBuf). + +Arrow uses manual reference counting to track whether a buffer is in use, or can be deallocated or returned +to the allocator's pool. This simply means that each buffer has a counter keeping track of the number of references to +this buffer, and end user is responsible for properly incrementing/decrementing the counter according the buffer is used. + +In Arrow, each ArrowBuf has an associated ReferenceManager that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and Review comment: Was this added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
