This is an automated email from the ASF dual-hosted git repository.

alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 28266f1f17 MINOR: [Python][Docs] Add examples for 
`MapArray.from_arrays` (#37656)
28266f1f17 is described below

commit 28266f1f173f27c0db2aafd9497d4af7eb3f441c
Author: Slobodan Ilic <[email protected]>
AuthorDate: Thu Sep 14 17:46:32 2023 +0200

    MINOR: [Python][Docs] Add examples for `MapArray.from_arrays` (#37656)
    
    
    
    ### Rationale for this change
    This PR enriched the `MapArray.from_arrays` with some nice examples. The 
examples are from the real-world scenario of working with survey data (scaled 
down, of course).
    
    ### What changes are included in this PR?
    The only change that this PR presents is to the docstring of the 
`MapArray.from_arrays` function.
    
    ### Are these changes tested?
    Does not apply
    
    ### Are there any user-facing changes?
    Yes, the docstring of the `MapArray.from_arrays` function.
    
    Lead-authored-by: Slobodan Ilic <[email protected]>
    Co-authored-by: Slobodan Ilic <[email protected]>
    Co-authored-by: Alenka Frim <[email protected]>
    Co-authored-by: Dane Pitkin <[email protected]>
    Signed-off-by: AlenkaF <[email protected]>
---
 python/pyarrow/array.pxi | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index e26b1ad329..e36d8b2f04 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -2363,6 +2363,79 @@ cdef class MapArray(ListArray):
         Returns
         -------
         map_array : MapArray
+
+        Examples
+        --------
+        First, let's understand the structure of our dataset when viewed in a 
rectangular data model. 
+        The total of 5 respondents answered the question "How much did you 
like the movie x?".
+        The value -1 in the integer array means that the value is missing. The 
boolean array
+        represents the null bitmask corresponding to the missing values in the 
integer array.
+
+        >>> import pyarrow as pa
+        >>> movies_rectangular = np.ma.masked_array([
+        ...     [10, -1, -1],
+        ...     [8, 4, 5],
+        ...     [-1, 10, 3],
+        ...     [-1, -1, -1],
+        ...     [-1, -1, -1]
+        ... ],
+        ... [
+        ...     [False, True, True],
+        ...     [False, False, False],
+        ...     [True, False, False],
+        ...     [True, True, True],
+        ...     [True, True, True],
+        ... ])
+
+        To represent the same data with the MapArray and from_arrays, the data 
is
+        formed like this:
+
+        >>> offsets = [
+        ...     0, #  -- row 1 start
+        ...     1, #  -- row 2 start
+        ...     4, #  -- row 3 start
+        ...     6, #  -- row 4 start
+        ...     6, #  -- row 5 start
+        ...     6, #  -- row 5 end
+        ... ]
+        >>> movies = [
+        ...     "Dark Knight", #  ---------------------------------- row 1
+        ...     "Dark Knight", "Meet the Parents", "Superman", #  -- row 2
+        ...     "Meet the Parents", "Superman", #  ----------------- row 3
+        ... ]
+        >>> likings = [
+        ...     10, #  -------- row 1
+        ...     8, 4, 5, #  --- row 2
+        ...     10, 3 #  ------ row 3
+        ... ]
+        >>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
+        0                                  [(Dark Knight, 10)]
+        1    [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
+        2              [(Meet the Parents, 10), (Superman, 3)]
+        3                                                   []
+        4                                                   []
+        dtype: object
+
+        If the data in the empty rows needs to be marked as missing, it's 
possible
+        to do so by modifying the offsets argument, so that we specify `None` 
as
+        the starting positions of the rows we want marked as missing. The end 
row
+        offset still has to refer to the existing value from keys (and values):
+
+        >>> offsets = [
+        ...     0, #  ----- row 1 start
+        ...     1, #  ----- row 2 start
+        ...     4, #  ----- row 3 start
+        ...     None, #  -- row 4 start
+        ...     None, #  -- row 5 start
+        ...     6, #  ----- row 5 end
+        ... ]
+        >>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
+        0                                  [(Dark Knight, 10)]
+        1    [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
+        2              [(Meet the Parents, 10), (Superman, 3)]
+        3                                                 None
+        4                                                 None
+        dtype: object
         """
         cdef:
             Array _offsets, _keys, _items

Reply via email to