rtpsw commented on PR #34392: URL: https://github.com/apache/arrow/pull/34392#issuecomment-1519077885
The [`AMD64 Conda C++` CI job](https://github.com/apache/arrow/actions/runs/4778086054/jobs/8494259405?pr=34392) shows that a similar failure with improved debug messages. The interesting observation from the pasted output below is for the comparison seen at the bottom: while the actual table is logically equal (up to flattening) to the expected table, the `AssertTablesEqual` comparison fails, as if the actual table had different data. Specifically, it "thinks" the actual data for column 4 (after flattening) is `[10, 11, 12]` whereas in fact it is `[10, null, 12]`. It looks like whatever race condition or root cause here actually affects this comparison logic and not `AsofJoinNode`. @westonpace, @icexelloss, @wjones127 - what are your thoughts about this? Pasted output: ``` /arrow/cpp/src/arrow/testing/gtest_util.cc:486: Failure Failed Column 4 is different: Unequal at absolute position 0 @@ -1, +1 @@ -11 +null Expected: [ 10, null, 12 ] Actual: [ 10, 11, 12 ] Google Test trace: /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:625: Right-1 type: int16 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:623: Right-0 type: int32 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:621: Left type: int16 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:619: Key type: binary /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:617: Time type: int16 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:615: Iteration: 28 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:606: Types seed: 1682252434581436418 /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:992: AsofJoinBasicTest_TestBasic7_MutateByKey /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:991: AsofJoinBasicTest_TestBasic7Forward_MutateByKey AsofJoinTest debug: AsofjoinNode(0x55b12a56d600): received batch from input 2: time: [ 00:00:00.000000, 00:00:00.001500, 00:00:00.002500 ] key2: [ 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000001 ] r1_v0: [ 1970-01-01 00:00:00.000000100, 1970-01-01 00:00:00.000000101, 1970-01-01 00:00:00.000000102 ] AsofjoinNode(0x55b12a56d600): received batch from input 1: time: [ 00:00:00.000000, 00:00:00.001500, 00:00:00.002500 ] key2: [ 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000001 ] r0_v0: [ 10, 11, 12 ] AsofjoinNode(0x55b12a56d600): received batch from input 0: time: [ 00:00:00.000000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001 ] l_v0: [ 00:00:00.000001 ] AsofjoinNode(0x55b12a56d600): received batch from input 0: time: [ 00:00:00.001000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000002 ] l_v0: [ 00:00:00.000002 ] AsofjoinNode(0x55b12a56d600): received batch from input 0: time: [ 00:00:00.002000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001 ] l_v0: [ 00:00:00.000003 ] AsofjoinNode(0x55b12a56d600): Advancing input 1 AsofjoinNode(0x55b12a56d600): key hasher 1 got hashes [10621589963589458110, 10621589963589458110, 10621589963589458110] AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=0 row=0 time=0 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): Advancing input 1 hit distant time=1500 at=0 AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=0 AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0 AsofjoinNode(0x55b12a56d600): Advancing input 2 AsofjoinNode(0x55b12a56d600): key hasher 2 got hashes [10621589963589458110, 10621589963589458110, 10621589963589458110] AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=0 row=0 time=0 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): Advancing input 2 hit distant time=1500 at=0 AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=0 AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0 AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [10621589963589458110] AsofjoinNode(0x55b12a56d600): Emplace: key=10621589963589458110 lhs_latest_row=0 lhs_latest_time=0 AsofjoinNode(0x55b12a56d600): i=1 has_entry=1 time=0 row=0 accepted=1 AsofjoinNode(0x55b12a56d600): i=2 has_entry=1 time=0 row=0 accepted=1 AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=1000 AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=1000 AsofjoinNode(0x55b12a56d600): produce batch 0: time: [ 00:00:00.000000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001 ] l_v0: [ 00:00:00.000001 ] r0_v0: [ 10 ] r1_v0: [ 1970-01-01 00:00:00.000000100 ] AsofjoinNode(0x55b12a56d600): Advancing input 1 AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=1000 row=1 time=1500 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): Advancing input 1 hit distant time=2500 at=1000 AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=1000 AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0 AsofjoinNode(0x55b12a56d600): Advancing input 2 AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=1000 row=1 time=1500 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): Advancing input 2 hit distant time=2500 at=1000 AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=1000 AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0 AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [12073851083248405108] AsofjoinNode(0x55b12a56d600): Emplace: key=12073851083248405108 lhs_latest_row=0 lhs_latest_time=1000 AsofjoinNode(0x55b12a56d600): i=1 has_entry=0 time=0 row=0 accepted=0 AsofjoinNode(0x55b12a56d600): i=2 has_entry=0 time=0 row=0 accepted=0 AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=2000 AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=2000 AsofjoinNode(0x55b12a56d600): produce batch 1: time: [ 00:00:00.001000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000002 ] l_v0: [ 00:00:00.000002 ] r0_v0: [ null ] r1_v0: [ null ] AsofjoinNode(0x55b12a56d600): Advancing input 1 AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=2000 row=2 time=2500 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=2000 AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0 AsofjoinNode(0x55b12a56d600): Advancing input 2 AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=2000 row=2 time=2500 key=10621589963589458110 AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=2000 AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0 AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [10621589963589458110] AsofjoinNode(0x55b12a56d600): Emplace: key=10621589963589458110 lhs_latest_row=0 lhs_latest_time=2000 AsofjoinNode(0x55b12a56d600): i=1 has_entry=1 time=2500 row=2 accepted=1 AsofjoinNode(0x55b12a56d600): i=2 has_entry=1 time=2500 row=2 accepted=1 AsofjoinNode(0x55b12a56d600): produce batch 2: time: [ 00:00:00.002000 ] key2: [ 1970-01-01 00:00:00.000000 ] key: [ 1970-01-01 00:00:00.000001 ] l_v0: [ 00:00:00.000003 ] r0_v0: [ 12 ] r1_v0: [ 1970-01-01 00:00:00.000000102 ] Comparing flattened expected table: time: time64[us] key2: timestamp[us, tz=UTC] key: timestamp[us, tz=UTC] l_v0: time64[us] r0_v0: uint16 r1_v0: timestamp[ns, tz=UTC] ---- time: [ [ 00:00:00.000000, 00:00:00.001000, 00:00:00.002000 ] ] key2: [ [ 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000, 1970-01-01 00:00:00.000000 ] ] key: [ [ 1970-01-01 00:00:00.000001, 1970-01-01 00:00:00.000002, 1970-01-01 00:00:00.000001 ] ] l_v0: [ [ 00:00:00.000001, 00:00:00.000002, 00:00:00.000003 ] ] r0_v0: [ [ 10, null, 12 ] ] r1_v0: [ [ 1970-01-01 00:00:00.000000100, null, 1970-01-01 00:00:00.000000102 ] ] with flattened result table: time: time64[us] key2: timestamp[us, tz=UTC] key: timestamp[us, tz=UTC] l_v0: time64[us] r0_v0: uint16 r1_v0: timestamp[ns, tz=UTC] ---- time: [ [ 00:00:00.000000 ], [ 00:00:00.001000 ], [ 00:00:00.002000 ] ] key2: [ [ 1970-01-01 00:00:00.000000 ], [ 1970-01-01 00:00:00.000000 ], [ 1970-01-01 00:00:00.000000 ] ] key: [ [ 1970-01-01 00:00:00.000001 ], [ 1970-01-01 00:00:00.000002 ], [ 1970-01-01 00:00:00.000001 ] ] l_v0: [ [ 00:00:00.000001 ], [ 00:00:00.000002 ], [ 00:00:00.000003 ] ] r0_v0: [ [ 10 ], [ null ], [ 12 ] ] r1_v0: [ [ 1970-01-01 00:00:00.000000100 ], [ null ], [ 1970-01-01 00:00:00.000000102 ] ] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
