[ 
https://issues.apache.org/jira/browse/MADLIB-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Domino Valdano updated MADLIB-1340:
-----------------------------------
    Description: 
The minibatcher's internal logic for picking a default batch size isn't strict 
enough.  It can crash for arrays of datatypes which are less than 32-bits.  I 
tried to come up with a simple repro, but it still needs some work.  Here's 
what I have now, for 16-bit type REAL[], haven't had a chance to test it yet:

madlib=# CREATE TABLE foo AS SELECT id, ARRAY[1.0,2.0,3.0,4.0]::REAL[] AS x, 1 
as Y FROM (SELECT GENERATE_SERIES(1,33*1024*1024) AS id) ids;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy 
entry.
SELECT 2097152
madlib=# \d foo;
      Table "public.foo"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 x      | real[]  |
 y      | integer |
Distributed randomly

madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched',   'y',    
'x');

the only issue with the above is that it generates a table with one REAL per 
row... instead we need an array of REAL's per row, something like this:

CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] FROM (SELECT ARRAY[i,i,i,i,i] AS i 
FROM (SELECT GENERATE_SERIES(1,9*1024) AS i) a1 ) a2;

  was:
The minibatcher's internal logic for picking a default batch size isn't strict 
enough.  It can crash for arrays of datatypes which are less than 32-bits.  I 
tried to come up with a simple repro, but it still needs some work.  Here's 
what I have now, for 16-bit type REAL[], haven't had a chance to test it yet:

madlib=# CREATE TABLE foo AS SELECT id, ARRAY[1.0,2.0,3.0,4.0]::REAL[] AS x, 1 
as Y FROM (SELECT GENERATE_SERIES(1,33*1024*1024) AS id) ids;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy 
entry.
SELECT 2097152
madlib=# \d foo;
      Table "public.foo"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 x      | real[]  |
 y      | integer |
Distributed randomly

madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched',   'y',    
'x');


> minibatch_preprocessor_dl crashes with default batch size
> ---------------------------------------------------------
>
>                 Key: MADLIB-1340
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1340
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>    Affects Versions: v1.16
>            Reporter: Domino Valdano
>            Priority: Minor
>             Fix For: v1.16
>
>
> The minibatcher's internal logic for picking a default batch size isn't 
> strict enough.  It can crash for arrays of datatypes which are less than 
> 32-bits.  I tried to come up with a simple repro, but it still needs some 
> work.  Here's what I have now, for 16-bit type REAL[], haven't had a chance 
> to test it yet:
> madlib=# CREATE TABLE foo AS SELECT id, ARRAY[1.0,2.0,3.0,4.0]::REAL[] AS x, 
> 1 as Y FROM (SELECT GENERATE_SERIES(1,33*1024*1024) AS id) ids;
> NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy 
> entry.
> SELECT 2097152
> madlib=# \d foo;
>       Table "public.foo"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id     | integer |
>  x      | real[]  |
>  y      | integer |
> Distributed randomly
> madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched',   'y',  
>   'x');
> the only issue with the above is that it generates a table with one REAL per 
> row... instead we need an array of REAL's per row, something like this:
> CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] FROM (SELECT ARRAY[i,i,i,i,i] AS 
> i FROM (SELECT GENERATE_SERIES(1,9*1024) AS i) a1 ) a2;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to