iChauster commented on PR #13314: URL: https://github.com/apache/arrow/pull/13314#issuecomment-1149099821
> > I did observe a speedup, but it does not exactly match my outputs for the ExpressionOverhead testing I have on my laptop. Let me know if I missed something in my code. > > So I guess the problem is that if we call `InputReceived` manually then we do not get any parallelism (that is today handled by the source node). So I think we will need to do that manually as well. > > We could manually schedule in the benchmark by creating a new TaskScheduler. [Here is a rough example](https://github.com/apache/arrow/commit/6b0069d97e70394923fcaea5ab468f85eb282d1c) that could be cleaned up. It's a bit complex but we could start to share this logic if we are going to test all the nodes. Yes, I had an inkling it had to do with the batch delivery especially with some of the metrics being slower than the source + projection + sink versions. By the way, I found that the `TaskScheduler` is actually less performant for some reason, does it compete with `ExpressionOverhead` on your setup? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
