[ 
https://issues.apache.org/jira/browse/ARROW-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492693#comment-17492693
 ] 

Will Ayd commented on ARROW-15643:
----------------------------------

It feels right to me to allow the target type to subset the originating type. 
I'm not yet sure about the different order aspect. There is definitely some 
ambiguity that arises when the order changes. If you have an originating type 
of "x, y, z" and a target type of "z, y, x" I don't think its very clear if the 
alignment should be by name or by position - perhaps that is where the flag you 
are thinking about comes into play?

> [C++] Kernel to select subset of fields of a StructArray
> --------------------------------------------------------
>
>                 Key: ARROW-15643
>                 URL: https://issues.apache.org/jira/browse/ARROW-15643
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: kernel
>
> Triggered by 
> https://stackoverflow.com/questions/71035754/pyarrow-drop-a-column-in-a-nested-structure.
>  I thought there was already an issue about this, but don't directly find one.
> Assume you have a struct array with some fields:
> {code}
> >>> arr = pa.StructArray.from_arrays([[1, 2, 3]]*3, names=['a', 'b', 'c'])
> >>> arr.type
> StructType(struct<a: int64, b: int64, c: int64>)
> {code}
> We have a kernel to select a single child field:
> {code}
> >>> pc.struct_field(arr, [0])
> <pyarrow.lib.Int64Array object at 0x7ffa9e229940>
> [
>   1,
>   2,
>   3
> ]
> {code}
> But if you want to subset the StructArray to some of its fields, resulting in 
> a new StructArray, that's not possible with {{struct_field}}, and doing this 
> manually is a bit cumbersome:
> {code}
> >>> fields = ['a', 'c']
> >>> arrays = [arr.field(n) for n in fields]
> >>> arr_subset = pa.StructArray.from_arrays(arrays, names=fields)
> >>> arr_subset.type
> StructType(struct<a: int64, c: int64>)
> {code}
> (this is still OK, but if you had a ChunkedArray, it certainly gets annoying)
> One option could be to expand the existing {{struct_field}} to allow 
> selecting multiple fields (although that probably gets ambigous/confusing 
> with how you currently select a recursively nested field -> [0, 1] currently 
> means "first child, second subchild" and not "first and second child"). 
> Or a new kernel like "struct_subset" or some other name.
> This might also overlap with general projection functionality? (cc 
> [~westonpace])



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to