[ 
https://issues.apache.org/jira/browse/ARROW-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitri Vorona updated ARROW-2790:
----------------------------------
    Description: 
Hi,

currently, buffers can contain uninitalized memory when built with buffer 
builders. This happens in two ways:
 # The most common implementation of AppendNull(s) adds a null-bit to the 
null-map and increments the lengths, leaving chunks of data values untouched
 # The padding is never initialized

This can lead to a number of problems, some of more or less theoretical nature, 
but it's worth fixing just to make valgrind happy again and allow it to run in 
CI :) Also, leaking uninitialized memory over a wire is never nice, and 
currently problem 1. ist never mitigated by the ipc writers (although the 
problem 2 ist). 

This PR changes the behaviour of AppendNull(s) and FinishInternal for the 
affected builders and adds a lot of checks to ensure the correct 
initialization. Most of the test succeed the current master, but cause valgrind 
warnings, if the buffer builders aren't patched.

I decided not to test explicitly if the nulled is really 0, just that it can be 
read without errors, since the specs don't specify exact values. I check the 
padded memory values explicitly, but I'm not completely sure, what the best 
course of action is there: we could also ignored the values in padding just 
checking if they are initialized, or we could adjust the spec in this point. 

  was:
Hi,

currently, buffers can contain uninitalized memory when built with buffer 
builders. This happens in two ways:
 # The most common implementation of AppendNull(s) adds a null-bit to the 
null-map and increments the lengths, leaving chunks of data values untouched
 # The padding is never initialized

This can lead to a number of problems, some of more or less theoretical nature, 
but it's worth fixing just to make valgrind happy again and allow it to run in 
CI :) Also, leaking uninitialized memory over a wire is never nice, and 
currently problem 1. ist never mitigated by the ipc writers (although the 
problem 2 ist). 

This PR changes the behaviour of AppendNull(s) for the affected builders and 
adds a lot of checks to ensure the correct initialization. Most of the test 
succeed the current master, but cause valgrind warnings, if the buffer builders 
aren't patched.

I decided not to test explicitly if the nulled/padded memory is really 0, just 
that it can be read without errors, since the specs don't specify exact values.


> [C++] Buffers contain uninitialized memory
> ------------------------------------------
>
>                 Key: ARROW-2790
>                 URL: https://issues.apache.org/jira/browse/ARROW-2790
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>    Affects Versions: 0.9.0
>            Reporter: Dimitri Vorona
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi,
> currently, buffers can contain uninitalized memory when built with buffer 
> builders. This happens in two ways:
>  # The most common implementation of AppendNull(s) adds a null-bit to the 
> null-map and increments the lengths, leaving chunks of data values untouched
>  # The padding is never initialized
> This can lead to a number of problems, some of more or less theoretical 
> nature, but it's worth fixing just to make valgrind happy again and allow it 
> to run in CI :) Also, leaking uninitialized memory over a wire is never nice, 
> and currently problem 1. ist never mitigated by the ipc writers (although the 
> problem 2 ist). 
> This PR changes the behaviour of AppendNull(s) and FinishInternal for the 
> affected builders and adds a lot of checks to ensure the correct 
> initialization. Most of the test succeed the current master, but cause 
> valgrind warnings, if the buffer builders aren't patched.
> I decided not to test explicitly if the nulled is really 0, just that it can 
> be read without errors, since the specs don't specify exact values. I check 
> the padded memory values explicitly, but I'm not completely sure, what the 
> best course of action is there: we could also ignored the values in padding 
> just checking if they are initialized, or we could adjust the spec in this 
> point. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to