[
https://issues.apache.org/jira/browse/MADLIB-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingyi Mei updated MADLIB-1270:
-------------------------------
Description:
There is some unexpected behavior when vector column to be split contains
different numbers of elements in the vectors. E.g.
Input table:
select * from test order by id;
id | t
----+---------
1 | \{a,b}
2 | \{c,d}
3 | \{e,f}
4 | \{g,h,i}
5 | \{j}
(5 rows)
select madlib.vec2cols('test','test_out_5','t',array['c1','c2','c3'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of
cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
select madlib.vec2cols('test','test_out_5','t',array['c1','c2'],'id');
vec2cols
----------
(1 row)
select * from test_out_5 order by id;
id | c1 | c2
----++--------
1 | a | b
2 | c | d
3 | e | f
4 | g | h
5 | j |
(5 rows)
select madlib.vec2cols('test','test_out_6','t',array['c1'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of
cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
--- Update-----
There are a couple of decisions to be made regarding supporting arrays of
different lengths:
-If we choose the array with maximal length in the vector_col, what do we do if
the user's passed-in feature_names does not have the same number of elements?
-What are the performance issues with looking through our vector_col for the
array with maximal length?
-How will we handle default feature names: will we create a feature name for
every element of the longest array entry?
was:
There is some unexpected behavior when vector column to be split contains
different numbers of elements in the vectors. E.g.
Input table:
select * from test order by id;
id | t
----+---------
1 | \{a,b}
2 | \{c,d}
3 | \{e,f}
4 | \{g,h,i}
5 | \{j}
(5 rows)
select madlib.vec2cols('test','test_out_5','t',array['c1','c2','c3'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of
cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
select madlib.vec2cols('test','test_out_5','t',array['c1','c2'],'id');
vec2cols
----------
(1 row)
select * from test_out_5 order by id;
id | c1 | c2
----+----+----
1 | a | b
2 | c | d
3 | e | f
4 | g | h
5 | j |
(5 rows)
select madlib.vec2cols('test','test_out_6','t',array['c1'],'id');
ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number of
cols in feature_names.
CONTEXT: Traceback (most recent call last):
PL/Python function "vec2cols", line 23, in <module>
return vec2cols_obj.vec2cols(**globals())
PL/Python function "vec2cols", line 149, in vec2cols
PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
PL/Python function "vec2cols", line 77, in _assert
PL/Python function "vec2cols"
> Unexepcted behavior in vec2cols function
> ----------------------------------------
>
> Key: MADLIB-1270
> URL: https://issues.apache.org/jira/browse/MADLIB-1270
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Rashmi Raghu
> Priority: Minor
> Fix For: v1.15.1
>
>
> There is some unexpected behavior when vector column to be split contains
> different numbers of elements in the vectors. E.g.
> Input table:
> select * from test order by id;
> id | t
> ----+---------
> 1 | \{a,b}
> 2 | \{c,d}
> 3 | \{e,f}
> 4 | \{g,h,i}
> 5 | \{j}
> (5 rows)
>
> select madlib.vec2cols('test','test_out_5','t',array['c1','c2','c3'],'id');
> ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number
> of cols in feature_names.
> CONTEXT: Traceback (most recent call last):
> PL/Python function "vec2cols", line 23, in <module>
> return vec2cols_obj.vec2cols(**globals())
> PL/Python function "vec2cols", line 149, in vec2cols
> PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
> PL/Python function "vec2cols", line 77, in _assert
> PL/Python function "vec2cols"
>
> select madlib.vec2cols('test','test_out_5','t',array['c1','c2'],'id');
> vec2cols
> ----------
> (1 row)
> select * from test_out_5 order by id;
> id | c1 | c2
> ----++--------
> 1 | a | b
> 2 | c | d
> 3 | e | f
> 4 | g | h
> 5 | j |
> (5 rows)
>
>
> select madlib.vec2cols('test','test_out_6','t',array['c1'],'id');
> ERROR: plpy.Error: vec2cols: Mismatch between size of vector_col and number
> of cols in feature_names.
> CONTEXT: Traceback (most recent call last):
> PL/Python function "vec2cols", line 23, in <module>
> return vec2cols_obj.vec2cols(**globals())
> PL/Python function "vec2cols", line 149, in vec2cols
> PL/Python function "vec2cols", line 112, in get_names_for_split_output_cols
> PL/Python function "vec2cols", line 77, in _assert
> PL/Python function "vec2cols"
>
> --- Update-----
> There are a couple of decisions to be made regarding supporting arrays of
> different lengths:
> -If we choose the array with maximal length in the vector_col, what do we do
> if the user's passed-in feature_names does not have the same number of
> elements?
> -What are the performance issues with looking through our vector_col for the
> array with maximal length?
> -How will we handle default feature names: will we create a feature name for
> every element of the longest array entry?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)