Le 19/09/2022 à 10:05, Yaron Gvili a écrit :
Hi Antoine,
Thanks. I understand it is a GoogleTest printout, but do you agree this
_default_ printing can be misleading? I believe it'd be better to change this
default printing, if possible. In addition, I think it is likely possible to
automate the non-default printout of structs (with memory holes) that are
defined in Arrow. WDYT?
I agree it's misleading. However, I don't know how your suggestion can
be implemented, so feel free to suggest a prototype and/or submit a PR :-)
Regards
Antoine.
Yaron.
________________________________
From: Antoine Pitrou <anto...@python.org>
Sent: Monday, September 19, 2022 3:57 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: apparently misleading test assertion printout
Hi Yaron,
This is what GoogleTest does when it doesn't know how to print out a
value. Guidance to fix this at:
https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values
Regards
Antoine.
Le 19/09/2022 à 09:54, Yaron Gvili a écrit :
Hi,
In my local code, I observed a test assertion printout that seems misleading.
The printout looks like this:
Expected equality of these values:
expected_empty_segment
Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00
00-00 00-00 00-00 00-00>
empty_segment
Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00
01-9F 2F-9B 90-51 F6-94>
It was caused by this line:
ASSERT_EQ(expected_empty_segment, empty_segment)
The comparison of the two instances involves this code:
struct ARROW_EXPORT GroupingSegment {
int64_t offset;
int64_t length;
bool is_open;
};
inline bool operator==(const GroupingSegment& segment1, const GroupingSegment&
segment2) {
return segment1.offset == segment2.offset && segment1.length == segment2.length
&&
segment1.is_open == segment2.is_open;
}
The corresponding values of these instances are:
GroupingSegment{0, 2, false}
GroupingSegment{0, 2, true}
What seems misleading is that the printout says each instance is a 24-bytes
object, which is technically true due to alignment of the struct to its largest
member's size, but effectively only the first 17-bytes are valid. Taking
24-bytes, the printout includes 7 extra bytes to the right. In this case, the
extra bytes are garbage, which changes from one run to the next, and it is
clear they are uninitialized. However, in general, it is not easy to
distinguish between uninitialized and overran memory.
In the above simple case, it is easy enough to figure out what's going on, but
I can imagine that in complex cases, especially ones where memory overruns are
suspected, a misleading printout could be much harder to figure out.
To improve, a possible solution for this is to define a template function returning the list
(or bitmap) of byte-offsets that are valid for a given instance. For example, one could define
that a struct {bool, int32_t} has valid byte-offsets {0, 4, 5, 6, 7} and this could likely even
be automated, if desirable. Then, these byte-offsets could be used in a printout that marks the
invalid bytes, e.g., the instance {true, 0x01010101} could be printed as "<01-xx xx-xx
xx-xx 01-01 01-01>". An instance for which the template function is not defined could
be printed like currently but with a warning that some printed bytes may be uninitialized.
Cheers,
Yaron.