lidavidm commented on issue #71:
URL: https://github.com/apache/arrow-adbc/issues/71#issuecomment-1222665133

   ## ConnectorX
   
   - Can partition the query along a given column, then fetch the partitions in 
parallel
   - "Copy-exactly-once" architecture
   - Uses preallocated buffers where possible (also, appears to do things like 
implement its own conversion to Python strings)
   
   These optimizations would probably be difficult to support, though we should 
preallocate where possible.
   
   ## Turbodbc
   
   ConnectorX's docs compare it to Turbodbc which tends to trail it, though 
Turbodbc does not appear to implement parallelization (that might explain the 
difference). 
   
   Turbodbc also lists some optimizations:
   https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html
   
   In particular, it can interleave I/O and conversion. That may be interesting 
for us, though libpq seems to only either give you a choice between 
row-at-a-time or getting all query results at once. 
   
   Turbodbc also implements some _memory_ optimizations: dictionary-encoding 
string fields, and dynamically determining the minimum integer width.
   
   ## pgeon
   - [Uses 
COPY](https://github.com/0x0L/pgeon/blob/daa0a82429934511f6863f637da484f707c815f9/src/c%2B%2B/pg_interface.cc)
 (DuckDB appears to do this too, though note DuckDB's postgres extension is 
GPL) That honestly seems to be the main optimization
   - Queries some metadata tables up front to determine proper types


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to