[ 
https://issues.apache.org/jira/browse/ARROW-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-10641:
------------------------------------------
    Description: 
A "replace" or "map" kernel to replace values in array based on mapping. This 
would be similar as the pandas {{Series.replace}} (or {{Series.map}}) kernel, 
and as a small illustration of what is meant:

{code}
In [41]: s = pd.Series(["Yes", "Y", "No", "N"])

In [42]: s
Out[42]: 
0    Yes
1      Y
2     No
3      N
dtype: object

In [43]: s.replace({"Y": "Yes", "N": "No"})
Out[43]: 
0    Yes
1    Yes
2     No
3     No
dtype: object

{code}

Note: in pandas the difference between "replace" and "map" is that replace will 
only replace a value if it is present in the mapping, while map will replace 
every value in the input array with the corresponding value in the mapping and 
return null if not present in the mapping. This different behaviour could maybe 
be triggered with a keyword.

Note, this is different from ARROW-10306 which is about string replacement 
_within_ array elements (replacing a substring in each string element in the 
array), while here it is about replacing full elements of the array)

cc [~maartenbreddels]

  was:
A "replace" or "map" kernel to replace values in array based on mapping. This 
would be similar as the pandas {{Series.replace}} (or {{Series.map}}) kernel, 
and as a small illustration of what is meant:

{code}
In [41]: s = pd.Series(["Yes", "Y", "No", "N"])

In [42]: s
Out[42]: 
0    Yes
1      Y
2     No
3      N
dtype: object

In [43]: s.replace({"Y": "Yes", "N": "No"})
Out[43]: 
0    Yes
1    Yes
2     No
3     No
dtype: object

{code}

Note: in pandas the difference between "replace" and "map" is that replace will 
only replace a value if it is present in the mapping, while map will replace 
every value in the input array with the corresponding value in the mapping and 
return null if not present in the mapping.

Note, this is different from ARROW-10306 which is about string replacement 
_within_ array elements (replacing a substring in each string element in the 
array), while here it is about replacing full elements of the array)

cc [~maartenbreddels]


> [C++] A "replace" or "map" kernel to replace values in array based on mapping
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-10641
>                 URL: https://issues.apache.org/jira/browse/ARROW-10641
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> A "replace" or "map" kernel to replace values in array based on mapping. This 
> would be similar as the pandas {{Series.replace}} (or {{Series.map}}) kernel, 
> and as a small illustration of what is meant:
> {code}
> In [41]: s = pd.Series(["Yes", "Y", "No", "N"])
> In [42]: s
> Out[42]: 
> 0    Yes
> 1      Y
> 2     No
> 3      N
> dtype: object
> In [43]: s.replace({"Y": "Yes", "N": "No"})
> Out[43]: 
> 0    Yes
> 1    Yes
> 2     No
> 3     No
> dtype: object
> {code}
> Note: in pandas the difference between "replace" and "map" is that replace 
> will only replace a value if it is present in the mapping, while map will 
> replace every value in the input array with the corresponding value in the 
> mapping and return null if not present in the mapping. This different 
> behaviour could maybe be triggered with a keyword.
> Note, this is different from ARROW-10306 which is about string replacement 
> _within_ array elements (replacing a substring in each string element in the 
> array), while here it is about replacing full elements of the array)
> cc [~maartenbreddels]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to