Hi Antoine, Thanks. I understand it is a GoogleTest printout, but do you agree this _default_ printing can be misleading? I believe it'd be better to change this default printing, if possible. In addition, I think it is likely possible to automate the non-default printout of structs (with memory holes) that are defined in Arrow. WDYT?
Yaron. ________________________________ From: Antoine Pitrou <anto...@python.org> Sent: Monday, September 19, 2022 3:57 AM To: dev@arrow.apache.org <dev@arrow.apache.org> Subject: Re: apparently misleading test assertion printout Hi Yaron, This is what GoogleTest does when it doesn't know how to print out a value. Guidance to fix this at: https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values Regards Antoine. Le 19/09/2022 à 09:54, Yaron Gvili a écrit : > Hi, > > In my local code, I observed a test assertion printout that seems misleading. > > The printout looks like this: > > Expected equality of these values: > expected_empty_segment > Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 > 00-00 00-00 00-00 00-00 00-00> > empty_segment > Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 > 00-00 01-9F 2F-9B 90-51 F6-94> > > It was caused by this line: > > ASSERT_EQ(expected_empty_segment, empty_segment) > > The comparison of the two instances involves this code: > > struct ARROW_EXPORT GroupingSegment { > int64_t offset; > int64_t length; > bool is_open; > }; > > inline bool operator==(const GroupingSegment& segment1, const > GroupingSegment& segment2) { > return segment1.offset == segment2.offset && segment1.length == > segment2.length && > segment1.is_open == segment2.is_open; > } > > The corresponding values of these instances are: > > GroupingSegment{0, 2, false} > GroupingSegment{0, 2, true} > > What seems misleading is that the printout says each instance is a 24-bytes > object, which is technically true due to alignment of the struct to its > largest member's size, but effectively only the first 17-bytes are valid. > Taking 24-bytes, the printout includes 7 extra bytes to the right. In this > case, the extra bytes are garbage, which changes from one run to the next, > and it is clear they are uninitialized. However, in general, it is not easy > to distinguish between uninitialized and overran memory. > > In the above simple case, it is easy enough to figure out what's going on, > but I can imagine that in complex cases, especially ones where memory > overruns are suspected, a misleading printout could be much harder to figure > out. > > To improve, a possible solution for this is to define a template function > returning the list (or bitmap) of byte-offsets that are valid for a given > instance. For example, one could define that a struct {bool, int32_t} has > valid byte-offsets {0, 4, 5, 6, 7} and this could likely even be automated, > if desirable. Then, these byte-offsets could be used in a printout that marks > the invalid bytes, e.g., the instance {true, 0x01010101} could be printed as > "<01-xx xx-xx xx-xx 01-01 01-01>". An instance for which the template > function is not defined could be printed like currently but with a warning > that some printed bytes may be uninitialized. > > > Cheers, > Yaron. >