[GitHub] [madlib] reductionista commented on a change in pull request #355: Keras fit interface

GitBox Mon, 18 Mar 2019 19:13:55 -0700

reductionista commented on a change in pull request #355: Keras fit interface
URL: https://github.com/apache/madlib/pull/355#discussion_r266666847


 ##########
 File path: src/ports/postgres/modules/convex/madlib_keras_helper.py_in
 ##########
 @@ -0,0 +1,180 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import plpy
+from keras import backend as K
+from keras import utils as keras_utils
+from keras.optimizers import *
+import numpy as np
+
+#######################################################################
+########### Keras specific functions #####
+#######################################################################
+
+def get_device_name_for_keras(use_gpu, seg, gpus_per_host):
+    if use_gpu:
+        device_name = '/gpu:0'
+        os.environ["CUDA_VISIBLE_DEVICES"] = str(seg % gpus_per_host)
+    else: # cpu only
+        device_name = '/cpu:0'
+        os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
+
+    return device_name
+
+def set_keras_session(use_gpu):
+    config = K.tf.ConfigProto()
+    if use_gpu:
+        config.gpu_options.allow_growth = False
+        config.gpu_options.per_process_gpu_memory_fraction = 0.9
+    session = K.tf.Session(config=config)
+    K.set_session(session)
+
+def clear_keras_session():
+    sess = K.get_session()
+    K.clear_session()
+    sess.close()
+
+def compile_and_set_weights(segment_model, compile_params, device_name,
+                            previous_state):
+    model_shapes = []
+    with K.tf.device(device_name):
+        compile_params = convert_string_of_args_to_dict(compile_params)
+        segment_model.compile(**compile_params)
+        # prev_segment_model.compile(**compile_params)
+        for a in segment_model.get_weights():
+            model_shapes.append(a.shape)
+
+        agg_loss, agg_accuracy, _, model_weights = deserialize_weights(
+            previous_state, model_shapes)
+        segment_model.set_weights(model_weights)
+    # prev_model.set_weights(model_weights)
+
+#######################################################################
+########### Helper functions to serialize and deserialize weights #####
+#######################################################################
+
+def deserialize_weights(model_state, model_shapes):
+    """
+    Parameters:
+        model_state: a stringified (serialized) state containing loss,
+        accuracy, buffer_count, and model_weights, passed from postgres
+        model_shapes: a list of tuples containing the shapes of each element
+        in keras.get_weights()
+    Returns:
+        buffer_count: the buffer count from state
+        model_weights: a list of numpy arrays that can be inputted into 
keras.set_weights()
+    """
+    if not model_state or not model_shapes:
+        return None
+    state = np.fromstring(model_state, dtype=np.float32)
+    model_weights_serialized = state[3:]
+    i, j, model_weights = 0, 0, []
+    while j < len(model_shapes):
+        next_pointer = i + reduce(lambda x, y: x * y, model_shapes[j])
+        weight_arr_portion = model_weights_serialized[i:next_pointer]
+        model_weights.append(weight_arr_portion.reshape(model_shapes[j]))
+        i, j = next_pointer, j + 1
+    return int(float(state[0])), int(float(state[1])), int(float(state[2])), 
model_weights
+
+def serialize_weights(loss, accuracy, buffer_count, model_weights):
+    """
+    Parameters:
+        loss, accuracy, buffer_count: float values
+        model_weights: a list of numpy arrays, what you get from
+        keras.get_weights()
+    Returns:
+        A stringified (serialized) state containing all these values, to be
+        passed to postgres
+    """
+    if model_weights is None:
+        return None
+    flattened_weights = [w.flatten() for w in model_weights]
+    model_weights_serialized = np.concatenate(flattened_weights)
+    new_model_string = np.array([loss, accuracy, buffer_count])
+    new_model_string = np.concatenate((new_model_string, 
model_weights_serialized))
+    new_model_string = np.float32(new_model_string)
+    return new_model_string.tostring()
+
+#######################################################################
+########### General Helper functions  #######
+#######################################################################
+
+def get_data_as_np_array(table_name, y, x, input_shape, num_classes):
+    """
+
+    :param table_name: Table containing the batch of images per row
+    :param y: Column name for y
+    :param x: Column name for x
+    :param input_shape: input_shape of data in array format [L , W , C]
+    :param num_classes: num of distinct classes in y
+    :return:
+    """
+    val_data_qry = "SELECT {0}, {1} FROM {2}".format(y, x, table_name)
+    input_shape = map(int, input_shape)
+    val_data = plpy.execute(val_data_qry)
+    indep_len = len(val_data[0][x])
+    pixels_per_image = int(input_shape[0] * input_shape[1] * input_shape[2])
+    x_validation = np.ndarray((0,indep_len, pixels_per_image))
+    y_validation = np.ndarray((0,indep_len))
+    for i in range(len(val_data)):
+        x_test = np.asarray((val_data[i][x],))
+        x_test = x_test.reshape(1, indep_len, pixels_per_image)
+        y_test = np.asarray((val_data[i][y],))
+        y_test = y_test.reshape(1, indep_len)
+        x_validation=np.concatenate((x_validation, x_test))
+        y_validation=np.concatenate((y_validation, y_test))
+    num_test_examples = x_validation.shape[0]
+    x_validation = x_validation.reshape(indep_len * num_test_examples, 
*input_shape)
+    x_validation = x_validation.astype('float64')
+    y_validation = y_validation.reshape(indep_len * num_test_examples)
+
+    x_validation = x_validation.astype('float64')
+    #x_validation /= 255.0
+    y_validation = keras_utils.to_categorical(y_validation, num_classes)
+
+    return x_validation, y_validation
+
+"""
+Used to convert compile_params and fit_params to actual argument dictionaries
+"""
+def convert_string_of_args_to_dict(str_of_args):
+    """Uses parenthases matching algorithm to intelligently convert
+    a string with valid python code into an argument dictionary"""
+    stack = []
+    dual = {
+        '(' : ')',
+        '[' : ']',
+        '{' : '}',
+    }
+    result_str = ""
+    for char in str_of_args:
+        if char in dual.keys():
+            stack.append(char)
+            result_str += char
+        elif char in dual.values() and stack:
+            if dual[stack[-1]] == char:
+                stack.pop(-1)
+            result_str += char
+        elif not stack and char == "=":
+            result_str += ":"
+        else:
+            result_str += char
+    return eval('{' + result_str + '}')
 
 Review comment:
   I'm not sure what the purpose of this function is.  Calling this and passing 
the result to `model.compile()` is very similar to something much simpler:
   ```
   eval('model.compile({0})'.format(str_of_args))
   ```
   The only difference I see is that if we use this function, then the user is 
required to add some extra quotes around the keywords, so that it looks more 
like a dict than keyword arguments.  Not only does this create more work for 
the user, it also makes everything messier and more difficult to read.  And if 
we're asking the user to format the string in a different way than it's 
actually passed to the function anyway, why not just have them use colons 
instead of equals... adding { } around it to make it an actual dict?  (And not 
bother adding the extra quotes.)  That would be easier to read and then we 
could just do:
   ```
   model.compile(**str_of_args)
   ```
   directly without calling any function to convert the args.  Slightly simpler 
code than if they didn't alter the argument string at all--but in neither case 
do I see a need for a separate function to convert the compile param string 
they're passing in.
   
   Possibly, the intention was to avoid calling `eval()` on a user supplied 
string, which opens a security hole.  But if that's the purpose of this 
function, then why do we still call eval on the whole string at the end of the 
function?  Same security hole either way.  (I do think it would be much better 
if we could avoid doing this, but I can't think of a way off the top of my 
head.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] reductionista commented on a change in pull request #355: Keras fit interface

Reply via email to