[
https://issues.apache.org/jira/browse/ARROW-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hirokazu SUZUKI updated ARROW-18091:
------------------------------------
Summary: [Ruby] Arrow::Table#join returns duplicated key columns (was:
[Ruby] Arrow::Table#join returns separated columns by key)
> [Ruby] Arrow::Table#join returns duplicated key columns
> -------------------------------------------------------
>
> Key: ARROW-18091
> URL: https://issues.apache.org/jira/browse/ARROW-18091
> Project: Apache Arrow
> Issue Type: Bug
> Components: Ruby
> Reporter: Hirokazu SUZUKI
> Priority: Major
>
> `Arrow::Table#join` returns columns with duplicate keys. Duplicate column
> names are acceptable in Arrow, but it is preferable to use one.
> Also with `type: :full_outer`, column data should be merged.
> table1
> =>
> #<Arrow::Table:0x7f9706109380 ptr=0x55a91a4cac10>
> KEY X
> 0 A 1
> 1 B 2
> 2 C 3
> table2
> =>
> #<Arrow::Table:0x7f970415d2c0 ptr=0x55a91a348ce0>
> KEY X
> 0 A 4
> 1 B 5
> 2 D 6
>
> Should omit `:KEY` in right
> table1.join(table2, :KEY)
> =>
> #<Arrow::Table:0x7f96fd152548 ptr=0x55a91af21110>
> KEY X KEY X
> 0 A 1 A 4
> 1 B 2 B 5
>
> Should merge `:KEY`s
> table1.join(table2, :KEY, type: :full_outer)
> =>
> #<Arrow::Table:0x7f96fd0e1550 ptr=0x55a91a1a6410>
> KEY X KEY X
> 0 A 1 A 4
> 1 B 2 B 5
> 2 C 3 (null) (null)
> 3 (null) (null) D 6
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)