Le 19/09/2022 à 10:05, Yaron Gvili a écrit :
Hi Antoine,

Thanks. I understand it is a GoogleTest printout, but do you agree this 
_default_ printing can be misleading? I believe it'd be better to change this 
default printing, if possible. In addition, I think it is likely possible to 
automate the non-default printout of structs (with memory holes) that are 
defined in Arrow. WDYT?

I agree it's misleading. However, I don't know how your suggestion can be implemented, so feel free to suggest a prototype and/or submit a PR :-)

Regards

Antoine.




Yaron.
________________________________
From: Antoine Pitrou <anto...@python.org>
Sent: Monday, September 19, 2022 3:57 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: apparently misleading test assertion printout


Hi Yaron,

This is what GoogleTest does when it doesn't know how to print out a
value. Guidance to fix this at:
https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values

Regards

Antoine.


Le 19/09/2022 à 09:54, Yaron Gvili a écrit :
Hi,

In my local code, I observed a test assertion printout that seems misleading.

The printout looks like this:

Expected equality of these values:
    expected_empty_segment
      Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00 
00-00 00-00 00-00 00-00>
    empty_segment
      Which is: 24-byte object <00-00 00-00 00-00 00-00 02-00 00-00 00-00 00-00 
01-9F 2F-9B 90-51 F6-94>

It was caused by this line:

ASSERT_EQ(expected_empty_segment, empty_segment)

The comparison of the two instances involves this code:

struct ARROW_EXPORT GroupingSegment {
    int64_t offset;
    int64_t length;
    bool is_open;
};

inline bool operator==(const GroupingSegment& segment1, const GroupingSegment& 
segment2) {
    return segment1.offset == segment2.offset && segment1.length == segment2.length 
&&
        segment1.is_open == segment2.is_open;
}

The corresponding values of these instances are:

GroupingSegment{0, 2, false}
GroupingSegment{0, 2, true}

What seems misleading is that the printout says each instance is a 24-bytes 
object, which is technically true due to alignment of the struct to its largest 
member's size, but effectively only the first 17-bytes are valid. Taking 
24-bytes, the printout includes 7 extra bytes to the right. In this case, the 
extra bytes are garbage, which changes from one run to the next, and it is 
clear they are uninitialized. However, in general, it is not easy to 
distinguish between uninitialized and overran memory.

In the above simple case, it is easy enough to figure out what's going on, but 
I can imagine that in complex cases, especially ones where memory overruns are 
suspected, a misleading printout could be much harder to figure out.

To improve, a possible solution for this is to define a template function returning the list 
(or bitmap) of byte-offsets that are valid for a given instance. For example, one could define 
that a struct {bool, int32_t} has valid byte-offsets {0, 4, 5, 6, 7} and this could likely even 
be automated, if desirable. Then, these byte-offsets could be used in a printout that marks the 
invalid bytes, e.g., the instance {true, 0x01010101} could be printed as "<01-xx xx-xx 
xx-xx 01-01 01-01>". An instance for which the template function is not defined could 
be printed like currently but with a warning that some printed bytes may be uninitialized.


Cheers,
Yaron.


Reply via email to