[ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390076#comment-16390076
 ] 

ASF GitHub Bot commented on ARROW-2280:
---------------------------------------

wesm closed pull request #1719: ARROW-2280: [Python] Return the offset for the 
buffers in pyarrow.Array
URL: https://github.com/apache/arrow/pull/1719
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 7899d9dbbd..e785c0ec5c 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -483,10 +483,23 @@ cdef class Array:
         with nogil:
             check_status(ValidateArray(deref(self.ap)))
 
+    property offset:
+
+        def __get__(self):
+            """
+            A relative position into another array's data, to enable zero-copy
+            slicing. This value defaults to zero but must be applied on all
+            operations with the physical storage buffers.
+            """
+            return self.sp_array.get().offset()
+
     def buffers(self):
         """
         Return a list of Buffer objects pointing to this array's physical
         storage.
+
+        To correctly interpret these buffers, you need to also apply the offset
+        multiplied with the size of the stored data type.
         """
         res = []
         _append_array_buffers(self.sp_array.get().data().get(), res)
diff --git a/python/pyarrow/includes/libarrow.pxd 
b/python/pyarrow/includes/libarrow.pxd
index d95f01661c..456fcca360 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -103,6 +103,7 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:
 
         int64_t length()
         int64_t null_count()
+        int64_t offset()
         Type type_id()
 
         int num_fields()
diff --git a/python/pyarrow/tests/test_array.py 
b/python/pyarrow/tests/test_array.py
index c1131a0023..f034d78b39 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -600,6 +600,15 @@ def test_buffers_primitive():
     assert 1 <= len(null_bitmap) <= 64  # XXX this is varying
     assert bytearray(null_bitmap)[0] == 0b00001011
 
+    # Slicing does not affect the buffers but the offset
+    a_sliced = a[1:]
+    buffers = a_sliced.buffers()
+    a_sliced.offset == 1
+    assert len(buffers) == 2
+    null_bitmap = buffers[0].to_pybytes()
+    assert 1 <= len(null_bitmap) <= 64  # XXX this is varying
+    assert bytearray(null_bitmap)[0] == 0b00001011
+
     assert struct.unpack('hhxxh', buffers[1].to_pybytes()) == (1, 2, 4)
 
     a = pa.array(np.int8([4, 5, 6]))


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.Array.buffers should also include the offsets
> --------------------------------------------------------------
>
>                 Key: ARROW-2280
>                 URL: https://issues.apache.org/jira/browse/ARROW-2280
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Uwe L. Korn
>            Assignee: Uwe L. Korn
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to