nswamy closed pull request #13105: [MXNET-1158] JVM Memory Management 
Documentation
URL: https://github.com/apache/incubator-mxnet/pull/13105
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/scala-package/examples/scripts/run_train_mnist.sh 
b/scala-package/examples/scripts/run_train_mnist.sh
index ea53c1ade66..d27b7cbb365 100755
--- a/scala-package/examples/scripts/run_train_mnist.sh
+++ b/scala-package/examples/scripts/run_train_mnist.sh
@@ -19,15 +19,31 @@
 
 set -e
 
+hw_type=cpu
+if [[ $1 = gpu ]]
+then
+    hw_type=gpu
+fi
+
+platform=linux-x86_64
+
+if [[ $OSTYPE = [darwin]* ]]
+then
+    platform=osx-x86_64
+    hw_type=cpu
+fi
+
 MXNET_ROOT=$(cd "$(dirname $0)/../../.."; pwd)
 echo $MXNET_ROOT
-CLASS_PATH=$MXNET_ROOT/scala-package/assembly/linux-x86_64-cpu/target/*:$MXNET_ROOT/scala-package/examples/target/*:$MXNET_ROOT/scala-package/examples/target/classes/lib/*:$MXNET_ROOT/scala-package/infer/target/*
+CLASS_PATH=$MXNET_ROOT/scala-package/assembly/$platform-$hw_type/target/*:$MXNET_ROOT/scala-package/examples/target/*:$MXNET_ROOT/scala-package/examples/target/classes/lib/*
 
 # model dir
 DATA_PATH=$2
 
-java -XX:+PrintGC -Xms256M -Xmx512M -Dmxnet.traceLeakedObjects=false -cp 
$CLASS_PATH \
-        org.apache.mxnetexamples.imclassification.TrainMnist \
-        --data-dir /home/ubuntu/mxnet_scala/scala-package/examples/mnist/ \
+java -XX:+PrintGC -Dmxnet.traceLeakedObjects=false -cp $CLASS_PATH \
+        org.apache.mxnetexamples.imclassification.TrainModel \
+        --data-dir $MXNET_ROOT/scala-package/examples/mnist/ \
+        --network mlp \
+        --num-layers 50 \
         --num-epochs 10000000 \
         --batch-size 1024
\ No newline at end of file
diff --git a/scala-package/memory-management.md 
b/scala-package/memory-management.md
new file mode 100644
index 00000000000..33c36b6e6ab
--- /dev/null
+++ b/scala-package/memory-management.md
@@ -0,0 +1,118 @@
+# JVM Memory Management
+The Scala and Java bindings of Apache MXNet use native memory (memory from the 
C++ heap in either RAM or GPU memory) for most of the MXNet objects such as 
NDArray, Symbol, Executor, KVStore, Data Iterators, etc.
+The associated Scala classes act only as wrappers. The operations done on 
these wrapper objects are then directed to the high performance MXNet C++ 
backend via the Java Native Interface (JNI). Therefore, the bytes are stored in 
the C++ native heap which allows for fast access.
+
+However, the JVM Garbage Collector only manages objects allocated in the JVM 
Heap and is not aware of the memory footprint of these objects in the native 
memory. Hence, the allocation/deallocation of native memory must be managed by 
MXNet Scala.
+Allocating native memory is straight forward and is done during the 
construction of the object by calling the associated C++ API through JNI. 
However, since JVM languages do not have destructors, the deallocation of these 
objects must be done explicitly.
+MXNet Scala provides a few easy modes of operation which are explained in 
detail below.
+
+## Memory Management in Scala 
+### 1.  
[ResourceScope.using](https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/ResourceScope.scala#L106)
 (Recommended)
+`ResourceScope.using` provides the familiar Java try-with-resources primitive 
in Scala and will automatically manage the memory of all the MXNet objects 
created in the associated code block (`body`). It works by tracking the 
allocations performed inside the code block deallocating when exiting the 
block. 
+Passing MXNet objects out of a using block can be easily accomplished by 
simply returning an object or an iterable containing multiple MXNet objects. If 
you have nested using blocks, then the returned objects will be moved into the 
parent scope as well.
+
+**Usage** 
+```scala
+ResourceScope.using() {
+    ResourceScope.using() {
+        val r1 = NDArray.ones(Shape(2, 2))
+        val r2 = NDArray.ones(Shape(3, 4))
+        val r3 = NDArray.ones(Shape(5, 6))
+        val r4 = NDArray.ones(Shape(7, 8))
+        (r3, r4)
+    }
+    r4
+}
+```
+In the example above, we have two ResourceScopes stacked together. In the 
inner scope, 4 NDArrays `(r1, r2, r3, r4)` are created and the NDArrays 
+`(r3, r4)` are returned. The inner ResourceScope recognizes that it should not 
deallocate these objects and automatically moves `r3` and  `r4` to the outer 
scope. When the outer scope 
+returns `r4` from its code-block, it will only deallocate `r3` and will remove 
`r4` from its list of objects to be deallocated. All other objects are 
automatically released by calling the C++ backend to free the native memory.
+
+**Note:**
+You should consider nesting ResourceScopes when you have layers of 
functionality in your application code or create a lot of MXNet objects such as 
NDArrays.  
+For example, holding onto all the memory that is created for an entire 
training loop can result in running out of memory, especially when training on 
GPUs which might only have 8 to 16 GB.  
+It is recommended not to use a single ResourceScope block which spans the 
entire training code. You should instead nest multiple scopes: an innermost 
scope where you run forward-backward passes on each batch, a middle scope for 
each epoch, and an outer scope that runs the entire training script. This is 
demonstrated in the example below:
+
+```scala
+ResourceScope.using() {
+ val m = Module()
+ m.bind()
+ val k = KVStore(...)
+ ResourceScope.using() {
+     val itr = MXIterator(..)
+     val num_epochs: Int = 100
+     //... 
+     for (i <- 0 until num_epoch) {
+     ResourceScope.using() {
+        val dataBatch = itr.next()
+        while(itr.next()) {
+           m.forward(dataBatch)
+           m.backward(dataBatch)
+           m.update()
+        }
+     }
+   }
+ }
+}
+
+```  
+       
+### 2.  Using Phantom References (Recommended for some use cases)
+
+Apache MXNet uses [Phantom 
References](https://docs.oracle.com/javase/8/docs/api/java/lang/ref/PhantomReference.html)
 to track all MXNet Objects that have native memory associated with it. 
+When the Garbage Collector runs, it identifies unreachable Scala/Java objects 
in the JVM Heap and finalizes them. 
+It then enqueues objects which are ready to be reclaimed into a reference 
queue. We take advantage of this and do a 
+pre-mortem cleanup on these wrapper objects by freeing the corresponding 
native memory as well.
+ 
+This approach is automatic and does not require any special code to clean up 
the native memory. However, the Garbage Collector is not aware of the 
potentially large amount of native memory used and therefore may not free up 
memory often enough with it's standard behavior.
+You can control the frequency of garbage collection by calling System.gc() at 
strategic points such as the end of an epoch or the end of a mini-batch.
+
+This approach could be suitable for some use cases such as inference on CPUs 
where you have a large amount of Memory (RAM) on your system.
+
+**Note:**
+Calling GC too frequently can also cause your application to perform poorly. 
This approach might not be suitable 
+for use cases which quickly allocate a large number of large NDArrays such as 
when training a GAN model.
+
+### 3. Using dispose Pattern (least Recommended)
+ 
+There might be situations where you want to manually manage the lifecycle of 
Apache MXNet objects. For such use-cases, we have provided the `dispose()` 
method which will manually deallocate the associated native memory when called. 
We have also
+made all MXNet objects 
[AutoCloseable](https://docs.oracle.com/javase/8/docs/api/java/lang/AutoCloseable.html).
 If you are using Java8 and above you can use it with try-with-resources or 
call close() in the finally block.
+
+**Note:**
+We recommend you avoid manually managing MXNet objects and instead use 
`ResourceScope.using`. This creates less readable code and could leak memory if 
you miss calling dispose (until it is cleaned up by the Garbage Collector 
through the Phantom References).
+
+```scala
+def showDispose(): Unit = {
+    val r = NDArray.ones(Shape (2, 2))
+    r.dispose()
+}
+```
+
+## Memory Management in Java
+Memory Management in MXNet Java is similar to Scala. We recommend you use 
[ResourceScope](https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/ResourceScope.scala#L32)
 in a `try-with-resources` block or in a `try-finally` block.
+The 
[try-with-resource](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html)
 tracks the resources declared in the try block and automatically closes them 
upon exiting (supported from Java 7 onwards). 
+The ResourceScope discussed above implements AutoCloseable and tracks all 
MXNet Objects created at a Thread Local scope level. 
+
+```java
+try(ResourceScope scope = new ResourceScope()) {
+    NDArray test = NDArray.ones((Shape (2,2))
+}
+```
+or 
+```java
+try {
+    ResourceScope scope = new ResourceScope()
+    NDArray test = NDArray.ones((Shape(2,2))
+} finally {
+    scope.close()
+}
+``` 
+
+**Note:**
+A ResourceScope within a try block tracks all MXNet Native Object Allocations 
(NDArray, Symbol, Executor, etc.,) and deallocates them at
+the end of the try block. This is also true of the objects that are returned 
e.g. in the example above, the native memory associated with `test` would be 
deallocated even if it were to be returned. 
+If you use the object outside of the try block, the process might crash due to 
illegal memory access.
+
+To retain certain objects created within try blocks, you should explicitly 
remove them from the scope by calling `scope.moveToOuterScope`.
+It is highly recommended to nest multiple try-with-resource ResourceScopes so 
you do not have to explicitly manage the lifecycle of the Native objects.
+


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to