maartenbreddels opened a new pull request #9000:
URL: https://github.com/apache/arrow/pull/9000


   Needs a rebase after https://github.com/apache/arrow/pull/8621 is merged
   
   I totally agree with 
https://github.com/python/cpython/blob/c9bc290dd6e3994a4ead2a224178bcba86f0c0e4/Objects/sliceobject.c#L252
 
   
   This was tricky to get right, the main difficulty is in manually dealing 
with reverse iterators. Therefore I put on extra guardrails by having the 
Python unittests cover a lot of cases. All edge cases detected by this are 
translated to the C++ unittest suite, so we could reduce them to reduce pytest 
execution cost (I added 1 second).
   
   Slicing is based on Python, `[start, stop)` inclusive/exclusive semantics, 
where an index refers to a codeunit (like Python apparently, badly documented), 
and negative indices start counting from the right. `step != 0` is supported, 
like Python. 
   
   The only thing we cannot support easily, are things like reversing a string, 
since in Python one can do `s[::-1]` or `s[-1::-1]`, but we don't support empty 
values with the Option machinery (we model this as an c-`int64`). To mimic 
this, we can do `pc.utf8_slice_codeunits(ar, start=-1, end=-sys.maxsize, 
step=-1)` (i.e. a very large negative value). 
   
   For instance, libraries such as Pandas and Vaex can do sth like that, 
confirmed to be working by modifying the unittest like this:
   ```python
   import sys
   @pytest.mark.parametrize('start', list(range(-6, 6)) + [None])
   @pytest.mark.parametrize('stop', list(range(-6, 6)) + [None])
   @pytest.mark.parametrize('step', [-3, -2, -1, 1, 2, 3])
   def test_slice_compatibility(start,stop, step):
       input = pa.array(["", "𝑓", "𝑓ö", "𝑓öõ", "𝑓öõḍ", "𝑓öõḍš"])
       expected = pa.array([k.as_py()[start:stop:step] for k in input])
       if start is None:
           start = -sys.maxsize if step > 0 else sys.maxsize
       if stop is None:
           stop = sys.maxsize if step > 0 else -sys.maxsize
       result = pc.utf8_slice_codeunits(input, start=start, stop=stop, 
step=step)
       assert expected.equals(result)
   ```
   
   So libraries using this can implement the full Python behavior with this 
workaround.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to