This is an automated email from the ASF dual-hosted git repository.
alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 28266f1f17 MINOR: [Python][Docs] Add examples for
`MapArray.from_arrays` (#37656)
28266f1f17 is described below
commit 28266f1f173f27c0db2aafd9497d4af7eb3f441c
Author: Slobodan Ilic <[email protected]>
AuthorDate: Thu Sep 14 17:46:32 2023 +0200
MINOR: [Python][Docs] Add examples for `MapArray.from_arrays` (#37656)
### Rationale for this change
This PR enriched the `MapArray.from_arrays` with some nice examples. The
examples are from the real-world scenario of working with survey data (scaled
down, of course).
### What changes are included in this PR?
The only change that this PR presents is to the docstring of the
`MapArray.from_arrays` function.
### Are these changes tested?
Does not apply
### Are there any user-facing changes?
Yes, the docstring of the `MapArray.from_arrays` function.
Lead-authored-by: Slobodan Ilic <[email protected]>
Co-authored-by: Slobodan Ilic <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Dane Pitkin <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
---
python/pyarrow/array.pxi | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index e26b1ad329..e36d8b2f04 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -2363,6 +2363,79 @@ cdef class MapArray(ListArray):
Returns
-------
map_array : MapArray
+
+ Examples
+ --------
+ First, let's understand the structure of our dataset when viewed in a
rectangular data model.
+ The total of 5 respondents answered the question "How much did you
like the movie x?".
+ The value -1 in the integer array means that the value is missing. The
boolean array
+ represents the null bitmask corresponding to the missing values in the
integer array.
+
+ >>> import pyarrow as pa
+ >>> movies_rectangular = np.ma.masked_array([
+ ... [10, -1, -1],
+ ... [8, 4, 5],
+ ... [-1, 10, 3],
+ ... [-1, -1, -1],
+ ... [-1, -1, -1]
+ ... ],
+ ... [
+ ... [False, True, True],
+ ... [False, False, False],
+ ... [True, False, False],
+ ... [True, True, True],
+ ... [True, True, True],
+ ... ])
+
+ To represent the same data with the MapArray and from_arrays, the data
is
+ formed like this:
+
+ >>> offsets = [
+ ... 0, # -- row 1 start
+ ... 1, # -- row 2 start
+ ... 4, # -- row 3 start
+ ... 6, # -- row 4 start
+ ... 6, # -- row 5 start
+ ... 6, # -- row 5 end
+ ... ]
+ >>> movies = [
+ ... "Dark Knight", # ---------------------------------- row 1
+ ... "Dark Knight", "Meet the Parents", "Superman", # -- row 2
+ ... "Meet the Parents", "Superman", # ----------------- row 3
+ ... ]
+ >>> likings = [
+ ... 10, # -------- row 1
+ ... 8, 4, 5, # --- row 2
+ ... 10, 3 # ------ row 3
+ ... ]
+ >>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
+ 0 [(Dark Knight, 10)]
+ 1 [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
+ 2 [(Meet the Parents, 10), (Superman, 3)]
+ 3 []
+ 4 []
+ dtype: object
+
+ If the data in the empty rows needs to be marked as missing, it's
possible
+ to do so by modifying the offsets argument, so that we specify `None`
as
+ the starting positions of the rows we want marked as missing. The end
row
+ offset still has to refer to the existing value from keys (and values):
+
+ >>> offsets = [
+ ... 0, # ----- row 1 start
+ ... 1, # ----- row 2 start
+ ... 4, # ----- row 3 start
+ ... None, # -- row 4 start
+ ... None, # -- row 5 start
+ ... 6, # ----- row 5 end
+ ... ]
+ >>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
+ 0 [(Dark Knight, 10)]
+ 1 [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
+ 2 [(Meet the Parents, 10), (Superman, 3)]
+ 3 None
+ 4 None
+ dtype: object
"""
cdef:
Array _offsets, _keys, _items