[
https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289502#comment-16289502
]
Nikhil edited comment on MADLIB-1185 at 12/13/17 4:42 PM:
----------------------------------------------------------
Here is a possible solution
1. We can create a new c++ file which will be used by all the 5 modules in
question viterbi.cpp, lda.cpp, svd.cpp, matrix_ops.cpp and arima.cpp
1. This file will be responsible for defining all the types i.e. FLOAT8TI,
INT4TI, INT8TI. There will be an init function in this file which will make a
call to madlib_get_typlenbyvalalign. something like
{code}
void init_pg_types()
{
madlib_get_typlenbyvalalign(FLOAT8OID, &FLOAT8TI.len, &FLOAT8TI.byval,
&FLOAT8TI.align);
}
{code}
1. We will create a sql interface for this init function `init_pg_types` which
will have to be called by the python layer of all the 5 modules. We will need
to make sure that we call this init function before calling any other functions
(that use the struct FLOAT8TI, INT4TI or INT8TI) from the modules.
This fix will be moved to 1.14 madlib release.
was (Author: nikhilkak):
Here is a possible solution
1. We can create a new c++ file which will be used by all the 5 modules in
question viterbi.cpp, lda.cpp, svd.cpp, matrix_ops.cpp and arima.cpp
1. This file will be responsible for defining all the types i.e. FLOAT8TI,
INT4TI, INT8TI. There will be an init function in this file which will make a
call to madlib_get_typlenbyvalalign. something like
```
void init_pg_types()
{
madlib_get_typlenbyvalalign(FLOAT8OID, &FLOAT8TI.len, &FLOAT8TI.byval,
&FLOAT8TI.align);
}
```
1. We will create a sql interface for this init function `init_pg_types` which
will have to be called by the python layer of all the 5 modules. We will need
to make sure that we call this init function before calling any other functions
(that use the struct FLOAT8TI, INT4TI or INT8TI) from the modules.
This fix will be moved to 1.14 madlib release.
> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
> Key: MADLIB-1185
> URL: https://issues.apache.org/jira/browse/MADLIB-1185
> Project: Apache MADlib
> Issue Type: Bug
> Components: DB Abstraction Layer
> Reporter: Nikhil
> Fix For: v1.13
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double
> array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert at least 1 million rows like these
> id | x | y
> -------+-------------------------+---
> 97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the
> database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)