[
https://issues.apache.org/jira/browse/ARROW-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dimitri Vorona updated ARROW-2790:
----------------------------------
Description:
Hi,
currently, buffers can contain uninitalized memory when built with buffer
builders. This happens in two ways:
# The most common implementation of AppendNull(s) adds a null-bit to the
null-map and increments the lengths, leaving chunks of data values untouched
# The padding is never initialized
This can lead to a number of problems, some of more or less theoretical nature,
but it's worth fixing just to make valgrind happy again and allow it to run in
CI :) Also, leaking uninitialized memory over a wire is never nice, and
currently problem 1. ist never mitigated by the ipc writers (although the
problem 2 ist).
This PR changes the behaviour of AppendNull(s) and FinishInternal for the
affected builders and adds a lot of checks to ensure the correct
initialization. Most of the test succeed the current master, but cause valgrind
warnings, if the buffer builders aren't patched.
I decided not to test explicitly if the nulled is really 0, just that it can be
read without errors, since the specs don't specify exact values. I check the
padded memory values explicitly, but I'm not completely sure, what the best
course of action is there: we could also ignore the values in padding just
checking if they are initialized, or we could adjust the spec in this point.
was:
Hi,
currently, buffers can contain uninitalized memory when built with buffer
builders. This happens in two ways:
# The most common implementation of AppendNull(s) adds a null-bit to the
null-map and increments the lengths, leaving chunks of data values untouched
# The padding is never initialized
This can lead to a number of problems, some of more or less theoretical nature,
but it's worth fixing just to make valgrind happy again and allow it to run in
CI :) Also, leaking uninitialized memory over a wire is never nice, and
currently problem 1. ist never mitigated by the ipc writers (although the
problem 2 ist).
This PR changes the behaviour of AppendNull(s) and FinishInternal for the
affected builders and adds a lot of checks to ensure the correct
initialization. Most of the test succeed the current master, but cause valgrind
warnings, if the buffer builders aren't patched.
I decided not to test explicitly if the nulled is really 0, just that it can be
read without errors, since the specs don't specify exact values. I check the
padded memory values explicitly, but I'm not completely sure, what the best
course of action is there: we could also ignored the values in padding just
checking if they are initialized, or we could adjust the spec in this point.
> [C++] Buffers contain uninitialized memory
> ------------------------------------------
>
> Key: ARROW-2790
> URL: https://issues.apache.org/jira/browse/ARROW-2790
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Affects Versions: 0.9.0
> Reporter: Dimitri Vorona
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Hi,
> currently, buffers can contain uninitalized memory when built with buffer
> builders. This happens in two ways:
> # The most common implementation of AppendNull(s) adds a null-bit to the
> null-map and increments the lengths, leaving chunks of data values untouched
> # The padding is never initialized
> This can lead to a number of problems, some of more or less theoretical
> nature, but it's worth fixing just to make valgrind happy again and allow it
> to run in CI :) Also, leaking uninitialized memory over a wire is never nice,
> and currently problem 1. ist never mitigated by the ipc writers (although the
> problem 2 ist).
> This PR changes the behaviour of AppendNull(s) and FinishInternal for the
> affected builders and adds a lot of checks to ensure the correct
> initialization. Most of the test succeed the current master, but cause
> valgrind warnings, if the buffer builders aren't patched.
> I decided not to test explicitly if the nulled is really 0, just that it can
> be read without errors, since the specs don't specify exact values. I check
> the padded memory values explicitly, but I'm not completely sure, what the
> best course of action is there: we could also ignore the values in padding
> just checking if they are initialized, or we could adjust the spec in this
> point.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)