[ 
https://issues.apache.org/jira/browse/ARROW-17821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608723#comment-17608723
 ] 

Joris Van den Bossche edited comment on ARROW-17821 at 9/23/22 12:25 PM:
-------------------------------------------------------------------------

It seems that a map type then better suites this case (if the lists are always 
equal length, and basically are key-value mapping, and a Map type is actually 
also modelled as a List<Struct>), as you actually also mentioned in the top 
post. Creating a MapArray from two individual list arrays is possible like the 
following:

{code}
In [1]: A = pa.array([['a', 'b'], ['a', 'b', 'c']])

In [2]: B = pa.array([[1, 2], [3, 4, 5]])

In [3]: M = pa.MapArray.from_arrays(A.offsets, A.values, B.values)

In [4]: M.type
Out[4]: MapType(map<string, int64>)

In [5]: M
Out[5]: 
<pyarrow.lib.MapArray object at 0x7fe7620e7340>
[
  keys:
  [
    "a",
    "b"
  ]
  values:
  [
    1,
    2
  ],
  keys:
  [
    "a",
    "b",
    "c"
  ]
  values:
  [
    3,
    4,
    5
  ]
]

In [6]: M.to_pandas()
Out[6]: 
0            [(a, 1), (b, 2)]
1    [(a, 3), (b, 4), (c, 5)]
dtype: object
{code}

And actually, from this MapArray, it is also quite easy to construct the 
equivalent List<Struct> array:

{code}
In [13]: pa.ListArray.from_arrays(M.offsets, M.values).type
Out[13]: ListType(list<item: struct<key: string not null, value: int64>>)
{code}


was (Author: jorisvandenbossche):
It seems that a map type then better suites this case (if the lists are always 
equal length, and basically are key-value mapping, and a Map type is actually 
also modelled as a List<Struct>), as you actually also mentioned in the top 
post. Creating a MapArray from two individual list arrays is possible like the 
following:

{code}
In [1]: A = pa.array([['a', 'b'], ['a', 'b', 'c']])

In [2]: B = pa.array([[1, 2], [3, 4, 5]])

In [3]: M = pa.MapArray.from_arrays(A.offsets, A.values, B.values)

In [4]: M.type
Out[4]: MapType(map<string, int64>)

In [5]: M
Out[5]: 
<pyarrow.lib.MapArray object at 0x7fe7620e7340>
[
  keys:
  [
    "a",
    "b"
  ]
  values:
  [
    1,
    2
  ],
  keys:
  [
    "a",
    "b",
    "c"
  ]
  values:
  [
    3,
    4,
    5
  ]
]

In [6]: M.to_pandas()
Out[6]: 
0            [(a, 1), (b, 2)]
1    [(a, 3), (b, 4), (c, 5)]
dtype: object
{code}

> Implement zip()
> ---------------
>
>                 Key: ARROW-17821
>                 URL: https://issues.apache.org/jira/browse/ARROW-17821
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: Adam Lippai
>            Priority: Major
>
> If column A has list\(x), column B has list\(y), column C has list(z), I'd 
> like to be able to create D = zip(A,B,C) where D would be list(\{ A: x, B: y, 
> C: z}).
> x, y, z are types in the example, type of the resulting is D is list(struct).
> Other features to consider:
>  * Zipping list(struct) with list\(x) or list(struct) with list(struct) 
> should be able to merge
>  * Zipping A,B into a Map with keys from A, values from B



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to