Frank McQuillan created MADLIB-1334:
---------------------------------------

             Summary: Mini-batch preprocessor for DL running very slowly
                 Key: MADLIB-1334
                 URL: https://issues.apache.org/jira/browse/MADLIB-1334
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Utilities
            Reporter: Frank McQuillan
             Fix For: v1.16


Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:

current `minibatch_preprocessor`
1) 60K MNIST training examples = 28.1 sec
2) 10K MNIST test examples = 5.9 sec

new `minibatch_preprocessor_dl`
3) 60K MNIST training examples = 1912.3 sec
4) 10K MNIST test examples = 24.2 sec

Wonder if there is a bug here, or at least a performance issue?  I thought 
`minibatch_preprocessor_dl` was supposed to be faster than 
`minibatch_preprocessor` 

(1)
{code}
madlib=# 
madlib=# SELECT madlib.minibatch_preprocessor('mnist_train',         -- Source 
table
madlib(#                                      'mnist_train_packed',  -- Output 
table
madlib(#                                      'y',                   -- 
Dependent variable
madlib(#                                      'x',                   -- 
Independent variables
madlib(#                                      NULL,                  -- 
Grouping 
madlib(#                                      NULL,                  -- Buffer 
size
madlib(#                                      TRUE                   -- One-hot 
encode integer dependent var
madlib(#                                      );
 minibatch_preprocessor 
------------------------
 
(1 row)

Time: 28093.977 ms
{code}

(2)
{code}
madlib=# SELECT madlib.minibatch_preprocessor('mnist_test',         -- Source 
table
madlib(#                                      'mnist_test_packed',  -- Output 
table
madlib(#                                      'y',                   -- 
Dependent variable
madlib(#                                      'x',                   -- 
Independent variables
madlib(#                                      NULL,                  -- 
Grouping 
madlib(#                                      NULL,                  -- Buffer 
size
madlib(#                                      TRUE                   -- One-hot 
encode integer dependent var
madlib(#                                      );
 minibatch_preprocessor 
------------------------
 
(1 row)

Time: 5934.194 ms
{code}

(3)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train',         -- 
Source table
madlib(#                                         'mnist_train_packed',  -- 
Output table
madlib(#                                         'y',                   -- 
Dependent variable
madlib(#                                         'x',                   -- 
Independent variable
madlib(#                                          NULL,                 -- 
Buffer size
madlib(#                                          255,                  -- 
Normalizing constant
madlib(#                                          NULL
madlib(#                                         ); 
 minibatch_preprocessor_dl 
---------------------------
 
(1 row)

Time: 1912268.396 ms
{code}

(4)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test',         -- 
Source table
madlib(#                                         'mnist_test_packed',  -- 
Output table
madlib(#                                         'y',                   -- 
Dependent variable
madlib(#                                         'x',                   -- 
Independent variable
madlib(#                                          NULL,                 -- 
Buffer size
madlib(#                                          255,                  -- 
Normalizing constant
madlib(#                                          NULL
madlib(#                                         ); 
 minibatch_preprocessor_dl 
---------------------------
 
(1 row)

Time: 24192.195 ms
{code}










--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to