[
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219836#comment-16219836
]
Jacques Nadeau commented on ARROW-1710:
---------------------------------------
I'm one of the voices strongly arguing for dropping the additional class
objects. (I also was the one who originally introduced the two separate sets
when the code was first developed.) My experience has been the following:
* Extra complexity of managing two different runtime classes is very expensive
(maintenance, coercing between, managing runtime code generation, etc)
* Most source data is actually declared as nullable but rarely has nulls
As such, having an adaptive interaction where you look at cells 64 values at a
time and adapt your behavior based on actual nullability (as opposed to
declared nullability) provides a much better performance lift in real world use
cases than having specialized code for declared non-nullable situations.
FYI: [~e.levine], the updated approach with vectors is moving to a situation
where we don't have a bit vector and ultimately also consolidates the buffer
for the bits and the fixed bytes in the same buffer. In that case, there is no
heap memory overhead and the direct memory overhead is 1 bit per value, far
less than necessary.
Also note that in reality, most people focused on super high performance Java
implementations interact directly with the memory. You can see an example of
how we do this here:
https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java#L89
If, in the future, if people need the vector classes to have an additional set
of methods such as:
allocateNewNoNull()
setSafeIgnoreNull(int index, int value)
let's just add those when someone's usecase requires it. No need to have an
extra set of vectors for that purpose.
> [Java] Decide what to do with non-nullable vectors in new vector class
> hierarchy
> ---------------------------------------------------------------------------------
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: Java - Vectors
> Reporter: Li Jin
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)