Adam Hooper created ARROW-12774:
-----------------------------------

             Summary: replace_substring_regex() creates invalid arrays => crash
                 Key: ARROW-12774
                 URL: https://issues.apache.org/jira/browse/ARROW-12774
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 4.0.0
            Reporter: Adam Hooper


{code:python}
arr = pa.array(['Brussels', 'Brussels', 'Brussels', 'Brussels', 'Brussels', 
'Brussels', 'Brussels', 'Brussels', 'Flanders', 'Flanders', 'Flanders', 
'Flanders', 'Flanders', 'Flanders', 'Flanders', 'Flanders', 'Ostbelgien', 
'Ostbelgien', 'Ostbelgien', 'Ostbelgien', 'Ostbelgien', 'Ostbelgien', 
'Ostbelgien', 'Ostbelgien', 'Wallonia', 'Wallonia', 'Wallonia', 'Wallonia', 
'Wallonia', 'Wallonia', 'Wallonia', 'Wallonia'])
arr2 = pa.compute.replace_substring_regex(arr, pattern="X", replacement="Y")
arr2.validate(full=True)
{code}

Expected results: a valid array
Actual results: {{pyarrow.lib.ArrowInvalid: Offset invariant failure: 
non-monotonic offset at slot 32: 0 < 264}}

So if you run {{arr.diff(arr2)}}, you'll get:

{code}
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_S_create
Aborted (core dumped)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to