IMPALA-5031: Don't dereference uninitialized memory as a bool. Every bool takes up a byte of memory, but not every byte of memory is a valid bool. Dereferencing that memory, even if only to write to, it is undefined behavior.
I tested exhaustive UBSAN and DEBUG. In UBSAN, the error is no longer present; in both UBSAN and DEBUG, all tests pass Change-Id: Ibc03401a9aeb966f33298268e53ab1040d007224 Reviewed-on: http://gerrit.cloudera.org:8080/7148 Reviewed-by: Jim Apple <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/9fd46f7d Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/9fd46f7d Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/9fd46f7d Branch: refs/heads/master Commit: 9fd46f7df1f6f02d0db08155d4f49bcadec0ce5f Parents: 61696f3 Author: Jim Apple <[email protected]> Authored: Sat Jun 10 10:32:26 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Mon Jun 12 23:54:40 2017 +0000 ---------------------------------------------------------------------- be/src/runtime/raw-value.cc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/9fd46f7d/be/src/runtime/raw-value.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/raw-value.cc b/be/src/runtime/raw-value.cc index 872b356..b7b499d 100644 --- a/be/src/runtime/raw-value.cc +++ b/be/src/runtime/raw-value.cc @@ -117,7 +117,11 @@ void RawValue::Write(const void* value, void* dst, const ColumnType& type, case TYPE_NULL: break; case TYPE_BOOLEAN: - *reinterpret_cast<bool*>(dst) = *reinterpret_cast<const bool*>(value); + // Unlike the other scalar types, bool has a limited set of valid values, so if + // 'dst' is uninitialized memory and happens to point to a value that is not a valid + // bool, then dereferencing it via *reinterpret_cast<bool*>(dst) is undefined + // behavior. + memcpy(dst, value, sizeof(bool)); break; case TYPE_TINYINT: *reinterpret_cast<int8_t*>(dst) = *reinterpret_cast<const int8_t*>(value);
