lidavidm commented on issue #71: URL: https://github.com/apache/arrow-adbc/issues/71#issuecomment-1222665133
## ConnectorX - Can partition the query along a given column, then fetch the partitions in parallel - "Copy-exactly-once" architecture - Uses preallocated buffers where possible (also, appears to do things like implement its own conversion to Python strings) These optimizations would probably be difficult to support, though we should preallocate where possible. ## Turbodbc ConnectorX's docs compare it to Turbodbc which tends to trail it, though Turbodbc does not appear to implement parallelization (that might explain the difference). Turbodbc also lists some optimizations: https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html In particular, it can interleave I/O and conversion. That may be interesting for us, though libpq seems to only either give you a choice between row-at-a-time or getting all query results at once. Turbodbc also implements some _memory_ optimizations: dictionary-encoding string fields, and dynamically determining the minimum integer width. ## pgeon - [Uses COPY](https://github.com/0x0L/pgeon/blob/daa0a82429934511f6863f637da484f707c815f9/src/c%2B%2B/pg_interface.cc) (DuckDB appears to do this too, though note DuckDB's postgres extension is GPL) That honestly seems to be the main optimization - Queries some metadata tables up front to determine proper types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
