reductionista opened a new pull request #560:
URL: https://github.com/apache/madlib/pull/560
Here are a few improvements to how we process compile params, so that the
code is more robust from a security perspective.
I haven't found a specific security exploit for these issues, but I
wouldn't be surprised if someone with enough time and dedication could find a
way to gain admin access via them... especially if there were a minor update to
either MADlib's deep learning library or keras which doesn't take into account
these potential issues. (And at the very least, this fixes some odd error
messages and crashes that can happen when strange parameters are passed.)
Changes:
1. Exclude `deserialize`, `serialize`, `get`, and anything starting with
`_` from the list of builtin losses and metrics we support.
We were unintentionally allowing users to pass `keras.losses.deserialize`,
`keras.losses._sys`, `keras.losses.get`, `keras.losses.__builtins__`, etc. as
loss functions to keras, and the same for `keras.metrics.*` for metric
functions. Most of what's in `keras.losses.*` are loss functions or classes
deriving from `keras.losses.Loss`, and most of what's in `keras.metrics` are
metric functions or classes deriving from `keras.metrics.Metric`, but there are
also these extra classes intended for other purposes (not intended for use
directly as metrics or losses).
The `deserialize()` function is particularly dangerous, in part because they
accept exactly 2 arguments which means, even if you just pass something as
simple as `loss=deserialize`, it will be called by keras itself during
evaluation as `deserialize(y_true, y_pred)`. And in part because the purpose
of the `deserialize()` utility function is to take a string from the user and
turn it into an instance of a callable class, which will then be called by
keras. In other words, it's a way of allowing users to pass in their own
custom functions, but in a way such that MADlib only sees it as a `str` to pass
along (rather than an arbitrary callable python object, which we do not allow
ordinary users to pass through to keras).
The danger is significantly mitigated by the fact that we don't allow any
characters after the function name to be passed, even as a string. For
example, if the user were allowed to pass `loss=deserialize(..., ...)` then it
would be fairly easy to gain admin access. But we only allow
`loss=deserialize` which makes it far more difficult if not impossible to
exploit. Nevertheless, this PR completely disables the use of such functions
for losses or metrics, in case there is some way to trick keras into passing
malicious arguments for y_true and y_pred.
2. Another potentially risky behavior was the input validator allowing the
user to pass as a metric any string which contains the substring
`top_k_categorical_accuracy`. This provides a loophole that circumvents the
intention to only allow metric names which are either a builtin keras metric or
something in the custom functions object table. For example, if the user
passes `metrics=[system("echo allow all >>
${MASTER_DATA_DIRECTORY}/pg_hba.conf")] /* top_k_categorical_accuracy */,
losses=...` for compile params, that will be considered a valid input by the
input validator and passed along. Once again, this doesn't appear to be an
immediate security issue, as later we run a `literal_eval()` on it, which will
error out. But it's safest if we catch things like this up front during input
validation.
From what I can tell, explicitly whitelisting *top_k_categorical_acurracy*
for a metric is unnecessary, since it is already a builtin keras function, and
we don't allow any arguments to be passed to it anyway, it must be an exact
match to a builtin or a custom function.
3. We were comparing the metrics parameter value substring `metrics[2:-2]`
to the builtin and custom function names, which implicitly assumes that the
first 2 characters are [' or [" and the last two are '] or "].
Handling this more carefully using a literal eval to extract the string
inside the first element of the list prevents sneaky inputs like
`metrics=[*__builtins__ ]`. (We already validate elsewhere that the full
metrics parameter value is of type `list`.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]