joellubi commented on PR #2254: URL: https://github.com/apache/arrow-adbc/pull/2254#issuecomment-2414404754
> @joellubi Most of the performance actually came from the improved handling of the channels rather than the switch to using `SHOW` since they only replaced the calls to selecting from `information_schema.schemata` etc. > > The way the channels were being handled caused bottlenecks since we weren't using buffered channels and the record reader was being passed through a channel instead of just using it directly. Switching up the managing of the channels led to about a 25% improvement in performance by removing the blocking. My tests showed a drop from ~5s to ~3.5s for a large GetObjects scenario. About 2/3 of the time is the raw snowflake execution. which for the ADBC account is taking a total of around 2 - 3 seconds depending on the query for all of the `SHOW` queries + the primary one Ah cool, the record reader handling is much cleaner now. Not sure why I did it that way originally. Good catch on increasing the buffer size for the channel. I did think that could be a bottleneck which is why I didn't make it unbuffered, but didn't think it would be so significant. I also couldn't think of a value to use that didn't feel somewhat arbitrary. Maybe making it configurable or set to `runtime.NumCPUs`? Not critical but could be nice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
