alamb commented on issue #8130:
URL: 
https://github.com/apache/arrow-datafusion/issues/8130#issuecomment-1809237585

   I started hacking on this locally -- the basic structure I have in mind 
looks something like this (I need to sort out the OnceFuture stuff, which is 
awkward at the moment). 
   
   But the idea is to wrap up the state needed to build output into a separate 
enum like the following:
   
   ```
   /// State machine for creating output for HashJoin
   ///
   /// TODO Add memory reservation for intermediate rows
   enum HashJoinOutput {
       /// output phase has not yet started, input is
       ReadingInput {
           /// future which builds hash table from left side
           left_fut: OnceFut<JoinLeftData>,
       },
       /// output phase has started, but have no probe batch
       Ready {
           // TODO make this into the proper state
           left_fut: OnceFut<JoinLeftData>,
       },
       /// and output is being built from probe batches
       Probing {
           data: JoinLeftData,
       },
       /// emitting any final unmatched indices, if any (depending on the join 
type)
       Unmatched {
           //
           data: JoinLeftData,
       },
       /// Input is complete, and output is complete
       Done,
   }
   ```
   
   Then I think adding the logic to incrementally compute the matching indices 
is more tractable (though as you say @korowa  we'll still have to protect 
against pathalogical cases where each input row in the probe batch matches all 
the rows in the hashtable). 
   
   I think this will take me a few days to code up realistically, and 
https://github.com/apache/arrow-datafusion/issues/8078 is higher priority for 
me. However, I think we'll be able to make this work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to