924060929 opened a new pull request, #63615:
URL: https://github.com/apache/doris/pull/63615

   ## Proposed changes
   
   Fix `Rows mismatched! Data may be lost` error when a fragment receives data 
from
   multiple ExchangeNode inputs with different partition types (e.g. NLJ with
   HASH-partitioned probe + BROADCAST build).
   
   ### Root cause
   
   `ThriftPlansBuilder.filterInstancesWhichReceiveDataFromRemote` used
   `.iterator().next()` to pick the first input ExchangeNode. The iteration 
order
   over a `Set<Entry>` is non-deterministic. When it happens to pick the 
BROADCAST
   input (1 destination per BE), `shuffle_idx_to_instance_idx` has only 1 entry,
   while the HASH LOCAL_EXCHANGE expects N entries (one per pipeline task). Most
   hash partition indices find no mapping, and BE reports the error.
   
   Reproduction: a CTE query with `MultiCastDataSinks` sending UNPARTITIONED 
(to a
   BROADCAST build) and HASH_PARTITIONED (to an INNER JOIN build) into the same
   scan-free fragment. The bug is non-deterministic because it depends on Set
   iteration order.
   
   ### Fix
   
   Iterate all input exchanges and select the one with the most destinations on 
the
   target worker. This correctly identifies the main data-carrying
   (HASH-partitioned) exchange, ensuring the map is complete.
   
   ### Workaround
   
   `SET ENABLE_NEREIDS_DISTRIBUTE_PLANNER=false`
   
   ## Further comments
   
   Existing regression test: 
`dictionary_p0/dictionary_load_and_get/test_dict_load_and_get_ip_trie`
   reproduces a hang caused by the same root cause (dict refresh path triggers 
the
   multi-input fragment pattern). With this fix the test passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to