agavra commented on issue #9959:
URL: https://github.com/apache/pinot/issues/9959#issuecomment-1347039570

   Three potential fixes:
   
   1. have two callbacks: onDataAvailable and onDataConsumed and only “use” a 
seen mail notification when onDataConsumed is called. the upside is that this 
gives a lot of flexibility to the scheduler, the downside is that if data is 
available from the probing side of the join but not the broadcast it will keep 
being scheduled unless I add some really fancy scheduling logic that knows to 
only schedule joins when one mailbox is complete
   2. I can make the HashJoinOperator cache data it reads from the probing 
mailbox. The obvious issue there is a potential memory pressure - this would be 
mitigated with flow control in place.
   3. only schedule when _seenMail contains mailboxes from the “first” mailbox 
in the list of mailboxes instead of any mailbox the operator reads from. we 
could make this more generic by instead of just using the first we could  have 
the API return any mailboxes that we’re ready to read from. downside is that 
this requires some pretty tightly coupled abstractions so we need to think 
through the API design well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to