[ 
https://issues.apache.org/jira/browse/ARROW-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968354#comment-16968354
 ] 

Joris Van den Bossche commented on ARROW-7071:
----------------------------------------------

Now, I think the main question is: what API could we offer for this?

* A method on Array? Something like {{array.set_validity_bitmap(..)}} or 
{{array.set_null_bitmap(..)}} (but not sure if it needs to be that clearly 
exposed)
* A settable attribute like {{array.null_bitmap}}
* A function to create a new array from a given array + bitmap? This could be 
similar to {{Array.from_buffers}}, but then a bit more convenient to use (as 
currently you can already use that to achieve this purpose)
* Alternative could be to expand {{pa.array(values, mask=[..])}} to accept a 
pyarrow array as values, and then use the {{mask}} keyword to specify the nulls 
as a boolean mask (although the current behaviour here is to have the final 
bitmap be a combination of nulls in the values and the mask, so this is not a 
way to override the bitmap, but maybe that's actually good)

A way to avoid the issue of "previously-null values" could also be to only 
allow setting the bitmap if there was not yet one before.

That would be enough for my original use case for this, where I want to create 
a StructArray from two pyarrow arrays, but also give it a null bitmap:

{code}
pa.StructArray.from_arrays([pa.array([1, 2, 3]), pa.array([2, 3, 4])], 
names=['a', 'b'])
{code}

For this very specific case, an option could also be to be able to pass a 
bitmap or mask keyword to {{pa.StructArray.from_arrays}}, but that's of course 
not a general solution for other types.

> [Python] Add Array convenience method to create "masked" view with different 
> validity bitmap
> --------------------------------------------------------------------------------------------
>
>                 Key: ARROW-7071
>                 URL: https://issues.apache.org/jira/browse/ARROW-7071
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 1.0.0
>
>
> NB: I'm not sure what kind of pitfalls there might be if replacing an 
> existing validity bitmap and exposing some previously-null values



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to