This is an automated email from the ASF dual-hosted git repository.
fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git
The following commit(s) were added to refs/heads/master by this push:
new 2896c24 add clarification in DL user docs re GPU memory release
2896c24 is described below
commit 2896c24acba9f25cc30d1a412ee2d84cc4cf5187
Author: Frank McQuillan <[email protected]>
AuthorDate: Thu Mar 26 13:14:41 2020 -0700
add clarification in DL user docs re GPU memory release
---
.../postgres/modules/deep_learning/madlib_keras.sql_in | 15 ++++++++++++++-
.../deep_learning/madlib_keras_fit_multiple_model.sql_in | 15 ++++++++++++++-
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
index e4794a3..75fa56a 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
@@ -84,9 +84,20 @@ but rather imported from an external source. This is in the
section
called "Predict BYOM" below, where "BYOM" stands for "Bring Your Own Model."
Note that the following MADlib functions are targeting a specific Keras
-version (2.2.4) with a specific Tensorflow kernel version (1.14).
+version (2.2.4) with a specific TensorFlow kernel version (1.14).
Using a newer or older version may or may not work as intended.
+@note CUDA GPU memory cannot be released until the process holding it is
terminated.
+When a MADlib deep learning function is called with GPUs, Greenplum internally
+creates a process (called a slice) which calls TensorFlow to do the
computation.
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or,
+query finishes and user waits for the timeout set by
`gp_vmem_idle_resource_timeout`.
+The default value for this timeout is 18 sec [8]. So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you
can
+also set it to a lower value).
+
@anchor keras_fit
@par Fit
The fit (training) function has the following format:
@@ -1620,6 +1631,8 @@
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Yuhao Zhang, and Arun Kumar, Technical Report, Computer Science and
Engineering, University of California,
San Diego https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf.
+[8] Greenplum Database server configuration parameters
https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
@anchor related
@par Related Topics
diff --git
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
index cd58d93..b929724 100644
---
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
+++
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
@@ -94,6 +94,17 @@ release the disk space once the fit multiple query has
completed execution.
This is not the case for GPDB 6+ where disk space is released during the
fit multiple query.
+@note CUDA GPU memory cannot be released until the process holding it is
terminated.
+When a MADlib deep learning function is called with GPUs, Greenplum internally
+creates a process (called a slice) which calls TensorFlow to do the
computation.
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or,
+query finishes and user waits for the timeout set by
`gp_vmem_idle_resource_timeout`.
+The default value for this timeout is 18 sec [8]. So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you
can
+also set it to a lower value).
+
@anchor keras_fit
@par Fit
The fit (training) function has the following format:
@@ -1381,10 +1392,12 @@ https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf
Geoffrey Hinton with Nitish Srivastava and Kevin Swersky,
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
-[6] Deep learning section of Apache MADlib wiki,
https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
+[6] Deep learning section of Apache MADlib wiki
https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
[7] Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville, MIT
Press, 2016.
+[8] Greenplum Database server configuration parameters
https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
@anchor related
@par Related Topics