[
https://issues.apache.org/jira/browse/ARROW-17821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608723#comment-17608723
]
Joris Van den Bossche edited comment on ARROW-17821 at 9/23/22 12:25 PM:
-------------------------------------------------------------------------
It seems that a map type then better suites this case (if the lists are always
equal length, and basically are key-value mapping, and a Map type is actually
also modelled as a List<Struct>), as you actually also mentioned in the top
post. Creating a MapArray from two individual list arrays is possible like the
following:
{code}
In [1]: A = pa.array([['a', 'b'], ['a', 'b', 'c']])
In [2]: B = pa.array([[1, 2], [3, 4, 5]])
In [3]: M = pa.MapArray.from_arrays(A.offsets, A.values, B.values)
In [4]: M.type
Out[4]: MapType(map<string, int64>)
In [5]: M
Out[5]:
<pyarrow.lib.MapArray object at 0x7fe7620e7340>
[
keys:
[
"a",
"b"
]
values:
[
1,
2
],
keys:
[
"a",
"b",
"c"
]
values:
[
3,
4,
5
]
]
In [6]: M.to_pandas()
Out[6]:
0 [(a, 1), (b, 2)]
1 [(a, 3), (b, 4), (c, 5)]
dtype: object
{code}
And actually, from this MapArray, it is also quite easy to construct the
equivalent List<Struct> array:
{code}
In [13]: pa.ListArray.from_arrays(M.offsets, M.values).type
Out[13]: ListType(list<item: struct<key: string not null, value: int64>>)
{code}
was (Author: jorisvandenbossche):
It seems that a map type then better suites this case (if the lists are always
equal length, and basically are key-value mapping, and a Map type is actually
also modelled as a List<Struct>), as you actually also mentioned in the top
post. Creating a MapArray from two individual list arrays is possible like the
following:
{code}
In [1]: A = pa.array([['a', 'b'], ['a', 'b', 'c']])
In [2]: B = pa.array([[1, 2], [3, 4, 5]])
In [3]: M = pa.MapArray.from_arrays(A.offsets, A.values, B.values)
In [4]: M.type
Out[4]: MapType(map<string, int64>)
In [5]: M
Out[5]:
<pyarrow.lib.MapArray object at 0x7fe7620e7340>
[
keys:
[
"a",
"b"
]
values:
[
1,
2
],
keys:
[
"a",
"b",
"c"
]
values:
[
3,
4,
5
]
]
In [6]: M.to_pandas()
Out[6]:
0 [(a, 1), (b, 2)]
1 [(a, 3), (b, 4), (c, 5)]
dtype: object
{code}
> Implement zip()
> ---------------
>
> Key: ARROW-17821
> URL: https://issues.apache.org/jira/browse/ARROW-17821
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Python
> Reporter: Adam Lippai
> Priority: Major
>
> If column A has list\(x), column B has list\(y), column C has list(z), I'd
> like to be able to create D = zip(A,B,C) where D would be list(\{ A: x, B: y,
> C: z}).
> x, y, z are types in the example, type of the resulting is D is list(struct).
> Other features to consider:
> * Zipping list(struct) with list\(x) or list(struct) with list(struct)
> should be able to merge
> * Zipping A,B into a Map with keys from A, values from B
--
This message was sent by Atlassian Jira
(v8.20.10#820010)