[
https://issues.apache.org/jira/browse/ARROW-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192701#comment-15192701
]
Dan Robinson commented on ARROW-62:
-----------------------------------
For whatever it's worth: it seems PostgreSQL uses 0 in a null bitmap to
indicate null values
(http://www.postgresql.org/docs/8.0/static/storage-page-layout.html) while
MySQL and SQL Server use 1
(https://dev.mysql.com/doc/internals/en/null-bitmap.html,
http://www.sqlpassion.at/archive/2011/06/29/the-mystery-of-the-null-bitmap-mask/).
And of course Drill uses 0, while Numpy uses 1. So there does not seem to be
an established convention yet. IMHO I guess I think the validity-map approach
that uses 0 is a little more elegant.
> Format: Are the nulls bits 0 or 1 for null values?
> --------------------------------------------------
>
> Key: ARROW-62
> URL: https://issues.apache.org/jira/browse/ARROW-62
> Project: Apache Arrow
> Issue Type: Bug
> Components: Format
> Reporter: Wes McKinney
> Assignee: Wes McKinney
>
> As brought up by Dan Robinson on the mailing list (thank you for catching
> this!), there is an inconsistency in the format documents in the
> representation of nulls with the ValueVectors code import -- since I drafted
> these format documents initially I'll take the blame for the inconsistency,
> but:
> * Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data
> * The format document currently states the opposite (values are null if the
> bit is set)
> I can see arguments both ways, but one argument for the ValueVectors style is
> that values must be explicitly set to be non-null, versus uninitialized
> values being accidentally interpreted as being non-null. When initializing a
> bitmap, one can {{memset}} the bits to 0, then set then to 1 when non-null
> values are appended during construction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)