[madlib] branch master updated: add clarification in DL user docs re GPU memory release

fmcquillan Thu, 26 Mar 2020 13:17:22 -0700

This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git



The following commit(s) were added to refs/heads/master by this push:
     new 2896c24  add clarification in DL user docs re GPU memory release
2896c24 is described below

commit 2896c24acba9f25cc30d1a412ee2d84cc4cf5187
Author: Frank McQuillan <[email protected]>
AuthorDate: Thu Mar 26 13:14:41 2020 -0700

    add clarification in DL user docs re GPU memory release
---
 .../postgres/modules/deep_learning/madlib_keras.sql_in    | 15 ++++++++++++++-
 .../deep_learning/madlib_keras_fit_multiple_model.sql_in  | 15 ++++++++++++++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in 
b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
index e4794a3..75fa56a 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
@@ -84,9 +84,20 @@ but rather imported from an external source.  This is in the 
section
 called "Predict BYOM" below, where "BYOM" stands for "Bring Your Own Model."
 
 Note that the following MADlib functions are targeting a specific Keras
-version (2.2.4) with a specific Tensorflow kernel version (1.14).
+version (2.2.4) with a specific TensorFlow kernel version (1.14).
 Using a newer or older version may or may not work as intended.
 
+@note CUDA GPU memory cannot be released until the process holding it is 
terminated. 
+When a MADlib deep learning function is called with GPUs, Greenplum internally 
+creates a process (called a slice) which calls TensorFlow to do the 
computation. 
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or, 
+query finishes and user waits for the timeout set by 
`gp_vmem_idle_resource_timeout`.  
+The default value for this timeout is 18 sec [8].  So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you 
can 
+also set it to a lower value).
+
 @anchor keras_fit
 @par Fit
 The fit (training) function has the following format:
@@ -1620,6 +1631,8 @@ 
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
 Yuhao Zhang, and Arun Kumar, Technical Report, Computer Science and 
Engineering, University of California,
 San Diego https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf.
 
+[8] Greenplum Database server configuration parameters 
https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
 @anchor related
 @par Related Topics
 
diff --git 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
index cd58d93..b929724 100644
--- 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
+++ 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
@@ -94,6 +94,17 @@ release the disk space once the fit multiple query has 
completed execution.
 This is not the case for GPDB 6+ where disk space is released during the
 fit multiple query.
 
+@note CUDA GPU memory cannot be released until the process holding it is 
terminated. 
+When a MADlib deep learning function is called with GPUs, Greenplum internally 
+creates a process (called a slice) which calls TensorFlow to do the 
computation. 
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or, 
+query finishes and user waits for the timeout set by 
`gp_vmem_idle_resource_timeout`.  
+The default value for this timeout is 18 sec [8].  So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you 
can 
+also set it to a lower value).
+
 @anchor keras_fit
 @par Fit
 The fit (training) function has the following format:
@@ -1381,10 +1392,12 @@ https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf
 Geoffrey Hinton with Nitish Srivastava and Kevin Swersky,
 http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
 
-[6] Deep learning section of Apache MADlib wiki, 
https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
+[6] Deep learning section of Apache MADlib wiki 
https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
 
 [7] Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville, MIT 
Press, 2016.
 
+[8] Greenplum Database server configuration parameters 
https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
 @anchor related
 @par Related Topics

[madlib] branch master updated: add clarification in DL user docs re GPU memory release

Reply via email to