[GitHub] [arrow] jorisvandenbossche commented on pull request #34570: GH-34568: [C++][Python] Expose Run-End Encoded arrays in Python Arrow

via GitHub Thu, 16 Mar 2023 13:12:05 -0700


jorisvandenbossche commented on PR #34570:
URL: https://github.com/apache/arrow/pull/34570#issuecomment-1472673013


   Something else: you can create an invalid REE array with non-increasing run 
ends:
   
   ```
   In [2]: a= pa.RunEndEncodedArray.from_arrays(5, [2, 4, 2, 5], [1, 2, 3, 4])
   
   In [3]: a
   Out[3]: 
   <pyarrow.lib.RunEndEncodedArray object at 0x7f8138ffa920>
   
   -- run_ends:
     [
       2,
       4,
       2,
       5
     ]
   -- values:
     [
       1,
       2,
       3,
       4
     ]
   
   In [4]: pc.run_end_decode(a)
   Segmentation fault (core dumped)
   ```
   
   And as you can see, decoding it then segfaults. 
   The full validation actually catches this:
   
   ```
   In [4]: a.validate(full=True)
   ...
   ArrowInvalid: Every run end must be strictly greater than the previous run 
end, but run_ends[2] is 2 and run_ends[1] is 4
   ```
   
   But the constructor only does the cheap validation (without `full=True`). 
But I suppose it is always a bit the question and trade-off what is considered 
a necessary / cheap check and what is only part of the full validation (the 
same is true for offsets in variable size list/binary)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #34570: GH-34568: [C++][Python] Expose Run-End Encoded arrays in Python Arrow

Reply via email to