Adam Hooper created ARROW-12774:
-----------------------------------
Summary: replace_substring_regex() creates invalid arrays => crash
Key: ARROW-12774
URL: https://issues.apache.org/jira/browse/ARROW-12774
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 4.0.0
Reporter: Adam Hooper
{code:python}
arr = pa.array(['Brussels', 'Brussels', 'Brussels', 'Brussels', 'Brussels',
'Brussels', 'Brussels', 'Brussels', 'Flanders', 'Flanders', 'Flanders',
'Flanders', 'Flanders', 'Flanders', 'Flanders', 'Flanders', 'Ostbelgien',
'Ostbelgien', 'Ostbelgien', 'Ostbelgien', 'Ostbelgien', 'Ostbelgien',
'Ostbelgien', 'Ostbelgien', 'Wallonia', 'Wallonia', 'Wallonia', 'Wallonia',
'Wallonia', 'Wallonia', 'Wallonia', 'Wallonia'])
arr2 = pa.compute.replace_substring_regex(arr, pattern="X", replacement="Y")
arr2.validate(full=True)
{code}
Expected results: a valid array
Actual results: {{pyarrow.lib.ArrowInvalid: Offset invariant failure:
non-monotonic offset at slot 32: 0 < 264}}
So if you run {{arr.diff(arr2)}}, you'll get:
{code}
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
Aborted (core dumped)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)