makemebitter commented on a change in pull request #425: DL: Add training for
multiple models
URL: https://github.com/apache/madlib/pull/425#discussion_r310793849
##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras.py_in
##########
@@ -589,31 +637,49 @@ def get_loss_metric_from_keras_eval(schema_madlib,
table, compile_params,
def internal_keras_eval_transition(state, dependent_var, independent_var,
model_architecture, serialized_weights,
compile_params,
current_seg_id, seg_ids, images_per_seg,
- gpus_per_host, segments_per_host, **kwargs):
+ gpus_per_host, segments_per_host,
+ is_final, **kwargs):
SD = kwargs['SD']
device_name = get_device_name_and_set_cuda_env(gpus_per_host,
current_seg_id)
agg_loss, agg_metric, agg_image_count = state
- if not agg_image_count:
- set_keras_session(device_name, gpus_per_host, segments_per_host)
- model = model_from_json(model_architecture)
- compile_and_set_weights(model, compile_params, device_name,
- serialized_weights)
+ # User called evaluate will always set is_final to true.
+ # If is_final is false, that means the fit already created a session and a
graph
+ # Otherwise, we may (last iteration of fit) or may not (user evaluate call)
+ # have a session.
+ if is_final and 'sess' not in SD:
+ sess = get_keras_session(device_name, gpus_per_host, segments_per_host)
+ SD['sess'] = sess
+ K.set_session(sess)
+ # Popping the segment model kept in the SD of
internal_keras_eval_transition,
+ # which is leftover from the previous iteration. But the session is
already
+ # cleared by fit_transition at this time, so the model cannot be
re-used.
+ SD.pop('segment_model', None)
Review comment:
Yes there are two separate SDs and two separate for eval_transition and
fit_transition. This del is not for clearing the fit_transition. It is for
deleting the model leftover from eval_transition at the last iteration, when
the session has been deleted by fit_transition, also nullifying the model.
Therefore we clear everything and reconstruct the session and model for this
corner case. For the other iterations, it can just reuse them.
In the current setting, there is only one session, and two separate(albeit
semantically same) graphs for fit_transiition and eval_transiition,
respectively. Closing the session(no matter in fit_transition or
eval_transition) will result in
both graphs unable to execute.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services