rtpsw commented on PR #34392:
URL: https://github.com/apache/arrow/pull/34392#issuecomment-1519077885

   The [`AMD64 Conda C++` CI 
job](https://github.com/apache/arrow/actions/runs/4778086054/jobs/8494259405?pr=34392)
 shows that a similar failure with improved debug messages. The interesting 
observation from the pasted output below is for the comparison seen at the 
bottom: while the actual table is logically equal (up to flattening) to the 
expected table, the `AssertTablesEqual` comparison fails, as if the actual 
table had different data. Specifically, it "thinks" the actual data for column 
4 (after flattening) is `[10, 11, 12]` whereas in fact it is `[10, null, 12]`. 
It looks like whatever race condition or root cause here actually affects this 
comparison logic and not `AsofJoinNode`.
   
   @westonpace, @icexelloss, @wjones127 - what are your thoughts about this?
   
   Pasted output:
   ```
   /arrow/cpp/src/arrow/testing/gtest_util.cc:486: Failure
   Failed
   Column 4 is different: Unequal at absolute position 0
   
   @@ -1, +1 @@
   -11
   +null
   Expected:
     [
       10,
       null,
       12
     ]
   Actual:
     [
       10,
       11,
       12
     ]
   Google Test trace:
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:625: Right-1 type: int16
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:623: Right-0 type: int32
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:621: Left type: int16
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:619: Key type: binary
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:617: Time type: int16
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:615: Iteration: 28
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:606: Types seed: 
1682252434581436418
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:992: 
AsofJoinBasicTest_TestBasic7_MutateByKey
   /arrow/cpp/src/arrow/acero/asof_join_node_test.cc:991: 
AsofJoinBasicTest_TestBasic7Forward_MutateByKey
   AsofJoinTest debug:
   AsofjoinNode(0x55b12a56d600): received batch from input 2:
   time:   [
       00:00:00.000000,
       00:00:00.001500,
       00:00:00.002500
     ]
   key2:   [
       1970-01-01 00:00:00.000000,
       1970-01-01 00:00:00.000000,
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001,
       1970-01-01 00:00:00.000001,
       1970-01-01 00:00:00.000001
     ]
   r1_v0:   [
       1970-01-01 00:00:00.000000100,
       1970-01-01 00:00:00.000000101,
       1970-01-01 00:00:00.000000102
     ]
   
   AsofjoinNode(0x55b12a56d600): received batch from input 1:
   time:   [
       00:00:00.000000,
       00:00:00.001500,
       00:00:00.002500
     ]
   key2:   [
       1970-01-01 00:00:00.000000,
       1970-01-01 00:00:00.000000,
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001,
       1970-01-01 00:00:00.000001,
       1970-01-01 00:00:00.000001
     ]
   r0_v0:   [
       10,
       11,
       12
     ]
   
   AsofjoinNode(0x55b12a56d600): received batch from input 0:
   time:   [
       00:00:00.000000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001
     ]
   l_v0:   [
       00:00:00.000001
     ]
   
   AsofjoinNode(0x55b12a56d600): received batch from input 0:
   time:   [
       00:00:00.001000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000002
     ]
   l_v0:   [
       00:00:00.000002
     ]
   
   AsofjoinNode(0x55b12a56d600): received batch from input 0:
   time:   [
       00:00:00.002000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001
     ]
   l_v0:   [
       00:00:00.000003
     ]
   
   AsofjoinNode(0x55b12a56d600): Advancing input 1
   AsofjoinNode(0x55b12a56d600): key hasher 1 got hashes [10621589963589458110, 
10621589963589458110, 10621589963589458110]
   AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=0 row=0 time=0 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): Advancing input 1 hit distant time=1500 at=0
   AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=0
   AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0
   AsofjoinNode(0x55b12a56d600): Advancing input 2
   AsofjoinNode(0x55b12a56d600): key hasher 2 got hashes [10621589963589458110, 
10621589963589458110, 10621589963589458110]
   AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=0 row=0 time=0 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): Advancing input 2 hit distant time=1500 at=0
   AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=0
   AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0
   AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [10621589963589458110]
   AsofjoinNode(0x55b12a56d600): Emplace: key=10621589963589458110 
lhs_latest_row=0 lhs_latest_time=0
   AsofjoinNode(0x55b12a56d600):   i=1 has_entry=1 time=0 row=0 accepted=1
   AsofjoinNode(0x55b12a56d600):   i=2 has_entry=1 time=0 row=0 accepted=1
   AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=1000
   AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=1000
   AsofjoinNode(0x55b12a56d600): produce batch 0:
   time:   [
       00:00:00.000000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001
     ]
   l_v0:   [
       00:00:00.000001
     ]
   r0_v0:   [
       10
     ]
   r1_v0:   [
       1970-01-01 00:00:00.000000100
     ]
   
   AsofjoinNode(0x55b12a56d600): Advancing input 1
   AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=1000 row=1 time=1500 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): Advancing input 1 hit distant time=2500 at=1000
   AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=1000
   AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0
   AsofjoinNode(0x55b12a56d600): Advancing input 2
   AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=1000 row=1 time=1500 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): Advancing input 2 hit distant time=2500 at=1000
   AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=1000
   AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0
   AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [12073851083248405108]
   AsofjoinNode(0x55b12a56d600): Emplace: key=12073851083248405108 
lhs_latest_row=0 lhs_latest_time=1000
   AsofjoinNode(0x55b12a56d600):   i=1 has_entry=0 time=0 row=0 accepted=0
   AsofjoinNode(0x55b12a56d600):   i=2 has_entry=0 time=0 row=0 accepted=0
   AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=2000
   AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=2000
   AsofjoinNode(0x55b12a56d600): produce batch 1:
   time:   [
       00:00:00.001000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000002
     ]
   l_v0:   [
       00:00:00.000002
     ]
   r0_v0:   [
       null
     ]
   r1_v0:   [
       null
     ]
   
   AsofjoinNode(0x55b12a56d600): Advancing input 1
   AsofjoinNode(0x55b12a56d600): memo 1 store: for_time=2000 row=2 time=2500 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): memo 1 remove: ts=2000
   AsofjoinNode(0x55b12a56d600): Advancing input 1 updated=0
   AsofjoinNode(0x55b12a56d600): Advancing input 2
   AsofjoinNode(0x55b12a56d600): memo 2 store: for_time=2000 row=2 time=2500 
key=10621589963589458110
   AsofjoinNode(0x55b12a56d600): memo 2 remove: ts=2000
   AsofjoinNode(0x55b12a56d600): Advancing input 2 updated=0
   AsofjoinNode(0x55b12a56d600): key hasher 0 got hashes [10621589963589458110]
   AsofjoinNode(0x55b12a56d600): Emplace: key=10621589963589458110 
lhs_latest_row=0 lhs_latest_time=2000
   AsofjoinNode(0x55b12a56d600):   i=1 has_entry=1 time=2500 row=2 accepted=1
   AsofjoinNode(0x55b12a56d600):   i=2 has_entry=1 time=2500 row=2 accepted=1
   AsofjoinNode(0x55b12a56d600): produce batch 2:
   time:   [
       00:00:00.002000
     ]
   key2:   [
       1970-01-01 00:00:00.000000
     ]
   key:   [
       1970-01-01 00:00:00.000001
     ]
   l_v0:   [
       00:00:00.000003
     ]
   r0_v0:   [
       12
     ]
   r1_v0:   [
       1970-01-01 00:00:00.000000102
     ]
   
   Comparing flattened expected table:
   time: time64[us]
   key2: timestamp[us, tz=UTC]
   key: timestamp[us, tz=UTC]
   l_v0: time64[us]
   r0_v0: uint16
   r1_v0: timestamp[ns, tz=UTC]
   ----
   time:
     [
       [
         00:00:00.000000,
         00:00:00.001000,
         00:00:00.002000
       ]
     ]
   key2:
     [
       [
         1970-01-01 00:00:00.000000,
         1970-01-01 00:00:00.000000,
         1970-01-01 00:00:00.000000
       ]
     ]
   key:
     [
       [
         1970-01-01 00:00:00.000001,
         1970-01-01 00:00:00.000002,
         1970-01-01 00:00:00.000001
       ]
     ]
   l_v0:
     [
       [
         00:00:00.000001,
         00:00:00.000002,
         00:00:00.000003
       ]
     ]
   r0_v0:
     [
       [
         10,
         null,
         12
       ]
     ]
   r1_v0:
     [
       [
         1970-01-01 00:00:00.000000100,
         null,
         1970-01-01 00:00:00.000000102
       ]
     ]
   
   with flattened result table:
   time: time64[us]
   key2: timestamp[us, tz=UTC]
   key: timestamp[us, tz=UTC]
   l_v0: time64[us]
   r0_v0: uint16
   r1_v0: timestamp[ns, tz=UTC]
   ----
   time:
     [
       [
         00:00:00.000000
       ],
       [
         00:00:00.001000
       ],
       [
         00:00:00.002000
       ]
     ]
   key2:
     [
       [
         1970-01-01 00:00:00.000000
       ],
       [
         1970-01-01 00:00:00.000000
       ],
       [
         1970-01-01 00:00:00.000000
       ]
     ]
   key:
     [
       [
         1970-01-01 00:00:00.000001
       ],
       [
         1970-01-01 00:00:00.000002
       ],
       [
         1970-01-01 00:00:00.000001
       ]
     ]
   l_v0:
     [
       [
         00:00:00.000001
       ],
       [
         00:00:00.000002
       ],
       [
         00:00:00.000003
       ]
     ]
   r0_v0:
     [
       [
         10
       ],
       [
         null
       ],
       [
         12
       ]
     ]
   r1_v0:
     [
       [
         1970-01-01 00:00:00.000000100
       ],
       [
         null
       ],
       [
         1970-01-01 00:00:00.000000102
       ]
     ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to