JerAguilon commented on PR #38380: URL: https://github.com/apache/arrow/pull/38380#issuecomment-1781871517
> What's your thinking around the long term approach? > * Migrate this node and asof join node to act more like the other nodes (no independent threads, works even if plan is multi-threaded) > * Migrate the other nodes to be more like this node (all parallelism is within each node and we don't have any plan-level parallelism) > * Keep the status quote (two types of nodes) I think bullet one is a noble idea, if we could ensure no performance regressions. I lack historical context on why the asof_join_node actually needs a proper `std::thread`, but I do think that the asofjoin's compute model is quite elegant for these timeseries-ish operations. Perhaps @icexelloss can chime in? I think if we keep the status quo, it'd be nice to add one more layer of abstraction so that the node itself isn't needing to bookkeep a process queue, since we're adding state to an already-stateful class. But that's a high level idea and I don't have a clear implementation idea yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
