[
https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284304#comment-16284304
]
Nikhil edited comment on MADLIB-1185 at 12/11/17 5:19 PM:
----------------------------------------------------------
The exception is coming from this code in PGException_proto.hpp
{code}
class PGException : public std::runtime_error {
public:
explicit
PGException()
: std::runtime_error("The backend raised an exception.") { }
// FIXME: Do something useful with inErrorData
PGException(ErrorData* /* inErrorData */)
: std::runtime_error("The backend raised an exception.") { }
};
{code}
The root cause of the problem lies in the type_info constructor in the
following files: viterbi.cpp, lda.cpp, svd.cpp, matrix_ops.cpp and arima.cpp.
All these files define a type_info struct like this
{code}
typedef struct __type_info{
Oid oid;
int16_t len;
bool byval;
char align;
__type_info(Oid oid):oid(oid)
{
madlib_get_typlenbyvalalign(oid, &len, &byval, &align);
}
} type_info;
static type_info FLOAT8TI(FLOAT8OID);
{code}
madlib_get_typlenbyvalalign is a madlib wrapper over the postgres function
get_typlenbyvalalign. madlib_get_typlenbyvalalign catches the exception and
does not print the actual exception coming from postgres. So we had to replace
all calls to madlib_get_typlenbyvalalign with get_typlenbyvalalign to see the
actual error. After that, we saw the following exception
{code}
ERROR: invalid cache ID: 74
CONTEXT: parallel worker
{code}
get_typlenbyvalalign makes a call to SearchSysCache1 and is called to assign
values to the struct members len, byval and align.
The problem here is that when you open a psql session and call any c madlib udf
for the first time, postgres calls dlopen on libmadlib.so. This ends up calling
all the type_info constructors during dl_open(the first call to dl_open will
always call all the typedef constructors.) which in turn call SearchSysCache1.
It is not recommended to call SearchSysCache1 during init. Here is a relevant
postgres thread about it:
https://www.postgresql.org/message-id/96420364a3d055172776752a1de80714%40smtp.hushmail.com
Hardcoding all the type_info struct members inside the constructor fixes the
problem.
was (Author: nikhilkak):
The exception is coming from this code in PGException_proto.hpp
{code}
class PGException : public std::runtime_error {
public:
explicit
PGException()
: std::runtime_error("The backend raised an exception.") { }
// FIXME: Do something useful with inErrorData
PGException(ErrorData* /* inErrorData */)
: std::runtime_error("The backend raised an exception.") { }
};
{code}
The root cause of the problem lies in the type_info constructor in the
following files: viterbi.cpp, lda.cpp, svd.cpp, matrix_ops.cpp and arima.cpp.
All these files define a type_info struct like this
{code}
typedef struct __type_info{
Oid oid;
int16_t len;
bool byval;
char align;
__type_info(Oid oid):oid(oid)
{
madlib_get_typlenbyvalalign(oid, &len, &byval, &align);
}
} type_info;
static type_info FLOAT8TI(FLOAT8OID);
{code}
madlib_get_typlenbyvalalign is a madlib wrapper over the postgres function
get_typlenbyvalalign. madlib_get_typlenbyvalalign catches the exception and
does not print the actual exception coming from postgres. So we had to replace
all calls to madlib_get_typlenbyvalalign with get_typlenbyvalalign to see the
actual error. After that, we saw the following exception
{code}
ERROR: invalid cache ID: 74
CONTEXT: parallel worker
{code}
get_typlenbyvalalign makes a call to SearchSysCache1 and is called to assign
values to the struct members len, byval and align.
The problem here is that when you open a psql session and call any c madlib udf
for the first time, postgres calls dlopen on libmadlib.so. This ends up calling
all the type_info constructors during dl_open which in turn call
SearchSysCache1. It is not recommended to call SearchSysCache1 during init.
Here is a relevant postgres thread about it:
https://www.postgresql.org/message-id/96420364a3d055172776752a1de80714%40smtp.hushmail.com
Hardcoding all the type_info struct members inside the constructor fixes the
problem.
> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
> Key: MADLIB-1185
> URL: https://issues.apache.org/jira/browse/MADLIB-1185
> Project: Apache MADlib
> Issue Type: Bug
> Components: DB Abstraction Layer
> Reporter: Nikhil
> Fix For: v1.13
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double
> array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert at least 1 million rows like these
> id | x | y
> -------+-------------------------+---
> 97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the
> database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)