Frank McQuillan created MADLIB-1334:
---------------------------------------
Summary: Mini-batch preprocessor for DL running very slowly
Key: MADLIB-1334
URL: https://issues.apache.org/jira/browse/MADLIB-1334
Project: Apache MADlib
Issue Type: Bug
Components: Module: Utilities
Reporter: Frank McQuillan
Fix For: v1.16
Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:
current `minibatch_preprocessor`
1) 60K MNIST training examples = 28.1 sec
2) 10K MNIST test examples = 5.9 sec
new `minibatch_preprocessor_dl`
3) 60K MNIST training examples = 1912.3 sec
4) 10K MNIST test examples = 24.2 sec
Wonder if there is a bug here, or at least a performance issue? I thought
`minibatch_preprocessor_dl` was supposed to be faster than
`minibatch_preprocessor`
(1)
{code}
madlib=#
madlib=# SELECT madlib.minibatch_preprocessor('mnist_train', -- Source
table
madlib(# 'mnist_train_packed', -- Output
table
madlib(# 'y', --
Dependent variable
madlib(# 'x', --
Independent variables
madlib(# NULL, --
Grouping
madlib(# NULL, -- Buffer
size
madlib(# TRUE -- One-hot
encode integer dependent var
madlib(# );
minibatch_preprocessor
------------------------
(1 row)
Time: 28093.977 ms
{code}
(2)
{code}
madlib=# SELECT madlib.minibatch_preprocessor('mnist_test', -- Source
table
madlib(# 'mnist_test_packed', -- Output
table
madlib(# 'y', --
Dependent variable
madlib(# 'x', --
Independent variables
madlib(# NULL, --
Grouping
madlib(# NULL, -- Buffer
size
madlib(# TRUE -- One-hot
encode integer dependent var
madlib(# );
minibatch_preprocessor
------------------------
(1 row)
Time: 5934.194 ms
{code}
(3)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train', --
Source table
madlib(# 'mnist_train_packed', --
Output table
madlib(# 'y', --
Dependent variable
madlib(# 'x', --
Independent variable
madlib(# NULL, --
Buffer size
madlib(# 255, --
Normalizing constant
madlib(# NULL
madlib(# );
minibatch_preprocessor_dl
---------------------------
(1 row)
Time: 1912268.396 ms
{code}
(4)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test', --
Source table
madlib(# 'mnist_test_packed', --
Output table
madlib(# 'y', --
Dependent variable
madlib(# 'x', --
Independent variable
madlib(# NULL, --
Buffer size
madlib(# 255, --
Normalizing constant
madlib(# NULL
madlib(# );
minibatch_preprocessor_dl
---------------------------
(1 row)
Time: 24192.195 ms
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)