[ 
https://issues.apache.org/jira/browse/ARROW-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578136#comment-17578136
 ] 

Clark Zinzow edited comment on ARROW-7245 at 8/10/22 7:43 PM:
--------------------------------------------------------------

+1 to having {{Concatenate}} go through the type promotion logic in the compute 
layer. I'm currently running into a similar issue with concatenating tables 
with different numeric types that can certainly be promoted to a common numeric 
type. I'm currently working around this in application code by doing manual 
type promotion of each column to the common dtype for that column across all 
tables (mimicking Arrow's internal type promotion logic 
[here|https://github.com/apache/arrow/blob/6cc37cf2d1ba72c46b64fbc7ac499bd0d7296d20/cpp/src/arrow/compute/kernels/codegen_internal.cc#L145-L338]))
 before concatenating the tables.

Here is a pyarrow MWE:
{code:python}
In [1]: t1 = pa.table({"a": pa.array([1, 2], type=pa.int16())})

In [2]: t2 = pa.table({"a": pa.array([3, 4], type=pa.int64())})

In [3]: pa.concat_tables([t1, t2])
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-91-40afec1155a5> in <module>
----> 1 pa.concat_tables([t1, t2])

~/.local/lib/python3.7/site-packages/pyarrow/table.pxi in 
pyarrow.lib.concat_tables()

~/.local/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~/.local/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: Schema at index 1 was different:
a: int16
vs
a: int64
{code}


was (Author: clarkzinzow):
+1 to having {{Concatenate}} go through the type promotion logic in the compute 
layer. I'm currently running into a similar issue with concatenating tables 
with different numeric types that can certainly be promoted to a common numeric 
type. I'm currently working around this in application code by doing manual 
type promotion of each column to the common dtype for that column across all 
tables (mimicking Arrow's internal type promotion logic 
[here|https://github.com/apache/arrow/blob/6cc37cf2d1ba72c46b64fbc7ac499bd0d7296d20/cpp/src/arrow/compute/kernels/codegen_internal.cc#L145-L338]))
 before concatenating the tables.

Here is a pyarrow MRP:
{code:python}
In [1]: t1 = pa.table({"a": pa.array([1, 2], type=pa.int16())})

In [2]: t2 = pa.table({"a": pa.array([3, 4], type=pa.int64())})

In [3]: pa.concat_tables([t1, t2])
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-91-40afec1155a5> in <module>
----> 1 pa.concat_tables([t1, t2])

~/.local/lib/python3.7/site-packages/pyarrow/table.pxi in 
pyarrow.lib.concat_tables()

~/.local/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~/.local/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: Schema at index 1 was different:
a: int16
vs
a: int64
{code}

> [C++] Allow automatic String -> LargeString promotions when concatenating 
> tables
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-7245
>                 URL: https://issues.apache.org/jira/browse/ARROW-7245
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> inspired by GitHub issue https://github.com/apache/arrow/issues/5874



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to