Hi,

In my local code, I observed a test assertion printout that seems misleading.

The printout looks like this:

Expected equality of these values:
  expected_empty_segment
    Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00 
00-00 00-00 00-00 00-00>
  empty_segment
    Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00 
01-9F 2F-9B 90-51 F6-94>

It was caused by this line:

ASSERT_EQ(expected_empty_segment, empty_segment)

The comparison of the two instances involves this code:

struct ARROW_EXPORT GroupingSegment {
  int64_t offset;
  int64_t length;
  bool is_open;
};

inline bool operator==(const GroupingSegment& segment1, const GroupingSegment& 
segment2) {
  return segment1.offset == segment2.offset && segment1.length == 
segment2.length &&
      segment1.is_open == segment2.is_open;
}

The corresponding values of these instances are:

GroupingSegment{0, 2, false}
GroupingSegment{0, 2, true}

What seems misleading is that the printout says each instance is a 24-bytes 
object, which is technically true due to alignment of the struct to its largest 
member's size, but effectively only the first 17-bytes are valid. Taking 
24-bytes, the printout includes 7 extra bytes to the right. In this case, the 
extra bytes are garbage, which changes from one run to the next, and it is 
clear they are uninitialized. However, in general, it is not easy to 
distinguish between uninitialized and overran memory.

In the above simple case, it is easy enough to figure out what's going on, but 
I can imagine that in complex cases, especially ones where memory overruns are 
suspected, a misleading printout could be much harder to figure out.

To improve, a possible solution for this is to define a template function 
returning the list (or bitmap) of byte-offsets that are valid for a given 
instance. For example, one could define that a struct {bool, int32_t} has valid 
byte-offsets {0, 4, 5, 6, 7} and this could likely even be automated, if 
desirable. Then, these byte-offsets could be used in a printout that marks the 
invalid bytes, e.g., the instance {true, 0x01010101} could be printed as 
"<01-xx xx-xx xx-xx 01-01 01-01>". An instance for which the template function 
is not defined could be printed like currently but with a warning that some 
printed bytes may be uninitialized.


Cheers,
Yaron.

Reply via email to