snowflake): improve GetObjects performance and semantics [arrow-adbc]

via GitHub Tue, 15 Oct 2024 08:51:21 -0700


joellubi commented on PR #2254:
URL: https://github.com/apache/arrow-adbc/pull/2254#issuecomment-2414404754


   > @joellubi Most of the performance actually came from the improved handling 
of the channels rather than the switch to using `SHOW` since they only replaced 
the calls to selecting from `information_schema.schemata` etc.
   > 
   > The way the channels were being handled caused bottlenecks since we 
weren't using buffered channels and the record reader was being passed through 
a channel instead of just using it directly. Switching up the managing of the 
channels led to about a 25% improvement in performance by removing the 
blocking. My tests showed a drop from ~5s to ~3.5s for a large GetObjects 
scenario. About 2/3 of the time is the raw snowflake execution. which for the 
ADBC account is taking a total of around 2 - 3 seconds depending on the query 
for all of the `SHOW` queries + the primary one
   
   Ah cool, the record reader handling is much cleaner now. Not sure why I did 
it that way originally.
   
   Good catch on increasing the buffer size for the channel. I did think that 
could be a bottleneck which is why I didn't make it unbuffered, but didn't 
think it would be so significant. I also couldn't think of a value to use that 
didn't feel somewhat arbitrary. Maybe making it configurable or set to 
`runtime.NumCPUs`? Not critical but could be nice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(go/adbc/driver/snowflake): improve GetObjects performance and semantics [arrow-adbc]

Reply via email to