[jira] [Created] (ARROW-5713) fancy indexing on pa.array

Artem KOZHEVNIKOV (JIRA) Mon, 24 Jun 2019 11:47:23 -0700

Artem KOZHEVNIKOV created ARROW-5713:
----------------------------------------


             Summary: fancy indexing on pa.array
                 Key: ARROW-5713
                 URL: https://issues.apache.org/jira/browse/ARROW-5713
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++, Python
            Reporter: Artem KOZHEVNIKOV


In numpy one can do :
{code:java}
In [2]: import numpy as np                                                      
                                                                                
In [3]: a = np.array(['a', 'bb', 'ccc', 'dddd'], dtype="O")                     
                                                                                
In [4]: indices = np.array([0, -1, 2, 2, 0, 3])                                 
                                                                                
In [5]: a[indices]                                                              
                                                                                
Out[5]: array(['a', 'dddd', 'ccc', 'ccc', 'a', 'dddd'], dtype=object)
{code}
It would be nice to have a similar feature in pyarrow.

Currently, pa.arrow __getitem__ supports only a slice or a single element as an 
argument.

Of course, using that we've some workarounds, like below
{code:java}
In [6]: import pyarrow as pa                                                    
                                                                                
In [7]: a = pa.array(['a', 'bb', 'ccc', 'dddd'])                                
                                                                                
In [8]: pa.array(a.to_pandas()[indices])  # if len(indices) is high             
                                                                                
                          
Out[8]:

<pyarrow.lib.StringArray object at 0x91bd845e8>

[

  "a",

  "dddd",

  "ccc",

  "ccc",

  "a",

  "dddd"

]

In [9]: pa.array([a[i].as_py() for i in indices])  # if len(indices) is low     
                                                                           
Out[9]:

<pyarrow.lib.StringArray object at 0x91bc14868>

[

  "a",

  "dddd",

  "ccc",

  "ccc",

  "a",

  "dddd"

]
{code}
both are not memory&cpu efficient.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5713) fancy indexing on pa.array

Reply via email to