agavra commented on issue #9959: URL: https://github.com/apache/pinot/issues/9959#issuecomment-1347039570
Three potential fixes: 1. have two callbacks: onDataAvailable and onDataConsumed and only “use” a seen mail notification when onDataConsumed is called. the upside is that this gives a lot of flexibility to the scheduler, the downside is that if data is available from the probing side of the join but not the broadcast it will keep being scheduled unless I add some really fancy scheduling logic that knows to only schedule joins when one mailbox is complete 2. I can make the HashJoinOperator cache data it reads from the probing mailbox. The obvious issue there is a potential memory pressure - this would be mitigated with flow control in place. 3. only schedule when _seenMail contains mailboxes from the “first” mailbox in the list of mailboxes instead of any mailbox the operator reads from. we could make this more generic by instead of just using the first we could have the API return any mailboxes that we’re ready to read from. downside is that this requires some pretty tightly coupled abstractions so we need to think through the API design well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
