[ 
https://issues.apache.org/jira/browse/ARROW-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574051#comment-17574051
 ] 

Aldrin Montana commented on ARROW-17216:
----------------------------------------

I am currently working on scalar hash functions, which might help to understand 
the hashing interfaces:
* A way to interface with hashing functions from *key_hash.h*: 
[scalar_hash.cc#L119|https://github.com/apache/arrow/blob/3c1fd3b03fa2b143d582244ca1bb93fbc0c84bcf/cpp/src/arrow/compute/kernels/scalar_hash.cc#L119]
* A way to interface with hashing function from *hashing.h*: 
[scalar_hash.cc#L160|https://github.com/apache/arrow/blob/3c1fd3b03fa2b143d582244ca1bb93fbc0c84bcf/cpp/src/arrow/compute/kernels/scalar_hash.cc#L160]

As for *ArraySpan* vs *KeyColumnArray*, they're both ultimately views into the 
same buffers (from *ArrayData*). I don't see anything that makes it difficult 
to support nested types in either, but I don't see anything that explicitly 
supports nested types (unless *ArraySpan* does so via some nuanced templates). 
I came across this issue while looking to see if anyone was working on 
convenient nested type support for *ArraySpan*, since it currently is a view 
into *ArrayData* buffers, but I don't see convenient functions in the spirit of 
*Array* level functions such as *StructArray::field()*.

I will hopefully make more headway into this tomorrow. So, at least as far as 
nested type support via *ArraySpan* and/or *KeyColumnArray* I might be able to 
put together some sort of tutorial-style cookbook on how to access the data 
appropriately for nested data types.

> [C++] Support joining tables with non-key fields as list
> --------------------------------------------------------
>
>                 Key: ARROW-17216
>                 URL: https://issues.apache.org/jira/browse/ARROW-17216
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Jayjeet Chakraborty
>            Priority: Major
>              Labels: query-engine
>
> I am trying to join 2 Arrow tables where some columns are of {{list<float>}} 
> data type. Note that my join columns/keys are primitive data types and some 
> my non-join columns/keys are of {{{}list<float>{}}}. But, PyArrow {{join()}} 
> cannot join such as table, although pandas can. It says
> {{ArrowInvalid: Data type list<item: float> is not supported in join non-key 
> field}}
> when I execute this piece of code
> {{joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])}}
> A 
> [stackoverflow|https://stackoverflow.com/questions/73071105/listitem-float-not-supported-in-join-non-key-field]
>  response pointed out that Arrow currently cannot handle non-fixed types for 
> joins. Can this be fixed ? Or is this intentional ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to