What was the resolution of this discussion? Was a JIRA made? It occurred to me recently that, if we decided that values masked by null bits need to be filled with a known value, this could open up optimizations in some use cases. For example, when reading a file into R, if we could specify what to use for the known null values, we could use R's missing value sentinels and then get pure zero-copy access. Some related JIRAs:
https://issues.apache.org/jira/browse/ARROW-8348 https://issues.apache.org/jira/browse/ARROW-7767 https://issues.apache.org/jira/browse/ARROW-3263 Neal On Sat, Feb 20, 2021 at 4:30 PM Antoine Pitrou <anto...@python.org> wrote: > > Le 21/02/2021 à 01:05, Wes McKinney a écrit : > > I agree that we should avoid leaking uninitialized memory in places > > where we have control over it. I could imagine a third party project > > having UBSAN warnings and then tracing the origin of them to something > > in Arrow that they then have to work around. As for the potential > > performance implications, we'll have to be vigilant with > > microbenchmarks. > > We're generally already doing this when we're careful, so we're already > paying the price (which I would estimate intuitively quite small). > Unfortunately, there doesn't seem to be an obvious way to check it > systematically on CI, but Valgrind can occasionally uncover it. > > Regards > > Antoine. >