Nandish Jayaram created MADLIB-1326:
---------------------------------------

             Summary: DL: Dev-check fails when keras_fit is called after 
array_scalar_mult
                 Key: MADLIB-1326
                 URL: https://issues.apache.org/jira/browse/MADLIB-1326
             Project: Apache MADlib
          Issue Type: Bug
          Components: Deep Learning
            Reporter: Nandish Jayaram
             Fix For: v1.16


In madlib_keras dev-check, we create the input data to fit using 
{{minibatch_preprocessor_dl()}}. This function internally calls 
{{array_scalar_mult()}}. If we call either of these functions followed by 
{{madlib_keras_fit()}}, then the following error pops up:
{code:java}
NOTICE:  Releasing segworker groups to finish aborting the transaction.
ERROR:  could not connect to segment: initialization of segworker group failed 
(cdbgang.c:237)
{code}
Digging further into Postgres logs suggests that there was a segmentation 
fault, and it seems like it's happening the moment {{import keras}} is called 
in {{madlib_keras_fit()}}.

This issue was first noticed while working on MADLIB-1304 (which was closed 
with [this 
commit|https://github.com/apache/madlib/commit/241074ae68cb8e15437f98abf1c2e3c7bb3471ae],
 as the comment [in this 
line|https://github.com/apache/madlib/commit/241074ae68cb8e15437f98abf1c2e3c7bb3471ae#diff-f89c193e163bfe0e7e3821445e38fa97R29]
 suggests. This happened on Greenplum then, and Postgres was not supporting 
deep learning yet. This was again noticed while working on MADLIB-1311, which 
added Postgres support. At this point, the failure happened on Postgres and 
there were no failures on Greenplum.

While working on MADLIB-1311, we tried a couple of things and observed an odd 
behavior. We created a dummy function:
{code:java}
create function dummy()
returns void as
$$
import keras
$$
language plpythonu;
{code}
If we ran {{select dummy()}} *before* running {{minibatch_preprocessor_dl()}} 
or {{array_scalar_mult()}}, then the whole dev-check passes. But running the 
same function right after calling either of those functions causes a failure.
 So, looks like any UDF that calls {{import keras}} *must* be run *before* 
calling {{minibatch_preprocessor_dl()}} or {{array_scalar_mult()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to