reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586728227
##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
# Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
if fit_param_str:
- fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
- literal_eval_fit_params = ['batch_size','epochs','verbose',
+ fit_params_dict = convert_string_of_args_to_dict(fit_param_str,
strip_quotes=False)
+ literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
Review comment:
We don't, but it makes the code less error prone and more easily
maintainable.
I think the question you're getting at, which I was very confused about when
I first looked at this code, is why were we handling it as a special case,
rather than just running literal eval on it like we do for any other param?
Previously we were adding quotes around the value if they weren't there
already, and then removing them later if the result is equal to the strings
'True' or 'False', with this awkward workaround:
```
if 'shuffle' in fit_params_dict:
shuffle_value = fit_params_dict['shuffle']
if shuffle_value == 'True' or shuffle_value == 'False':
fit_params_dict['shuffle'] = bool(shuffle_value)
```
But there is no reason to accept 'True' or 'False' as a param, as that's not
valid python code anyway. So instead of changing True to 'True' then back to
True again, we just leave it as is. If they pass in shuffle='True' or
shuffle="True" that should generate an error, just as it does if you passed
that as a fit param to keras outside of MADlib--it's not a valid parameter
value.
We do have to handle the callbacks fit param as a special case, since we
need to block any callbacks besides Tensorboard (at least for now).
So why were we doing a whole extra song and dance for shuffle as well? The
answer is not obvious, but I eventually found that it has to do with older
versions of keras allowing an alternate syntax for passing optimizer classes to
compile params.
In recent versions of keras, the optimizer classes are required to be passed
in the normal pythonic way, as class variables. But older versions also
allowed passing the name of the class as a string... which keras then converts
into the actual class before using.
How does that affect fit params? It doesn't, except for the fact that we
re-purposed a function that processes the compile params, which was designed in
an overly-general way to allow the value of any parameter to either have quotes
around it or not.
If my understanding is correct, the only reason this was necessary, even for
processing compile params, is to handle this special case of how optimizer
classes can be passed in older versions of keras. It's not necessary at all
for the fit params, since none of them can be passed in the alternate way, in
any version of keras. We were allowing some extra weird possibilities (like
passing shuffle='True' instead of shuffle=True) that keras doesn't normally
allow, just so that we didn't have to handle fit params differently from
compile params, or most compile params differently from optimizer params.
Hopefully eventually, we will be able to stop adding quotes to any of the
params passed in, and just use literal_eval on everything. But for now, it
seems like maintaining backwards compatibility for the optimizer params is a
good idea... so we still add quotes for all compile params, just not fit params
now. This removes the need to handle shuffle as a special case. As a next
step perhaps, we can do the same to either most or all of the compile params,
but I think it may require re-writing some tests.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]