slobodan-ilic commented on code in PR #37656:
URL: https://github.com/apache/arrow/pull/37656#discussion_r1322518758
##########
python/pyarrow/array.pxi:
##########
@@ -2363,6 +2363,78 @@ cdef class MapArray(ListArray):
Returns
-------
map_array : MapArray
+
+ Examples
+ --------
+ First, let's understand the structure of our dataset when viewed in a
rectangular data model.
+ The total of 5 respondents answered the question "How much did you
like the movie x?".
+ The value -1 in the integer array means that the value is missing. The
boolean array
+ represents the null bitmask corresponding to the missing values in the
integer array.
+
+ >>> movies_rectangular = np.ma.masked_array([
+ >>> [10, -1, -1],
+ >>> [8, 4, 5],
+ >>> [-1, 10, 3],
+ >>> [-1, -1, -1],
+ >>> [-1, -1, -1]
+ >>> ],
+ >>> [
+ >>> [False, True, True],
+ >>> [False, False, False],
+ >>> [True, False, False],
+ >>> [True, True, True],
+ >>> [True, True, True],
+ >>> ])
+
+ To represent the same data with the MapArray and from_arrays, the data
is
+ formed like this:
+
+ >>> offsets = [
+ >>> 0, # -- row 1 start
+ >>> 1, # -- row 2 start
+ >>> 4, # -- row 3 start
+ >>> 6, # -- row 4 start
+ >>> 6, # -- row 5 start
+ >>> 6, # -- row 5 end
+ >>> ]
+ >>> movies = [
+ >>> "Dark Knight", # ---------------------------------- row 1
+ >>> "Dark Knight", "Meet the Parents", "Superman", # -- row 2
+ >>> "Meet the Parents", "Superman", # ----------------- row 3
Review Comment:
Sure, we could do that, but from experience (of working in survey analytics)
it's much more descriptive to the users when they can relate to something that
they kinda know from experience. I was thinking about changing it to shorter
names, so that everything can print nicely, maybe, Mask, Supermen, Batman. But
I don't think names Movie A, Movie B and Movie C are serving the purpose of
providing immediate clarity to the user. Sure, they're workable, and are
technically correct, but IMHO they miss that slight edge (or wit).
On the trademark side, I don't have any idea about it, but I'd say that
we're not using any content here (and are therefore fine). But if it's better
to anonymize it I can do it in an instant. Lmk your thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]