Hi Antoine,

Thanks. I understand it is a GoogleTest printout, but do you agree this 
_default_ printing can be misleading? I believe it'd be better to change this 
default printing, if possible. In addition, I think it is likely possible to 
automate the non-default printout of structs (with memory holes) that are 
defined in Arrow. WDYT?


Yaron.
________________________________
From: Antoine Pitrou <anto...@python.org>
Sent: Monday, September 19, 2022 3:57 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: apparently misleading test assertion printout


Hi Yaron,

This is what GoogleTest does when it doesn't know how to print out a
value. Guidance to fix this at:
https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values

Regards

Antoine.


Le 19/09/2022 à 09:54, Yaron Gvili a écrit :
> Hi,
>
> In my local code, I observed a test assertion printout that seems misleading.
>
> The printout looks like this:
>
> Expected equality of these values:
>    expected_empty_segment
>      Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00>
>    empty_segment
>      Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 
> 00-00 01-9F 2F-9B 90-51 F6-94>
>
> It was caused by this line:
>
> ASSERT_EQ(expected_empty_segment, empty_segment)
>
> The comparison of the two instances involves this code:
>
> struct ARROW_EXPORT GroupingSegment {
>    int64_t offset;
>    int64_t length;
>    bool is_open;
> };
>
> inline bool operator==(const GroupingSegment& segment1, const 
> GroupingSegment& segment2) {
>    return segment1.offset == segment2.offset && segment1.length == 
> segment2.length &&
>        segment1.is_open == segment2.is_open;
> }
>
> The corresponding values of these instances are:
>
> GroupingSegment{0, 2, false}
> GroupingSegment{0, 2, true}
>
> What seems misleading is that the printout says each instance is a 24-bytes 
> object, which is technically true due to alignment of the struct to its 
> largest member's size, but effectively only the first 17-bytes are valid. 
> Taking 24-bytes, the printout includes 7 extra bytes to the right. In this 
> case, the extra bytes are garbage, which changes from one run to the next, 
> and it is clear they are uninitialized. However, in general, it is not easy 
> to distinguish between uninitialized and overran memory.
>
> In the above simple case, it is easy enough to figure out what's going on, 
> but I can imagine that in complex cases, especially ones where memory 
> overruns are suspected, a misleading printout could be much harder to figure 
> out.
>
> To improve, a possible solution for this is to define a template function 
> returning the list (or bitmap) of byte-offsets that are valid for a given 
> instance. For example, one could define that a struct {bool, int32_t} has 
> valid byte-offsets {0, 4, 5, 6, 7} and this could likely even be automated, 
> if desirable. Then, these byte-offsets could be used in a printout that marks 
> the invalid bytes, e.g., the instance {true, 0x01010101} could be printed as 
> "<01-xx xx-xx xx-xx 01-01 01-01>". An instance for which the template 
> function is not defined could be printed like currently but with a warning 
> that some printed bytes may be uninitialized.
>
>
> Cheers,
> Yaron.
>

Reply via email to