Re: [I] Initial connection to postgres very slow when number of OIDs is large [arrow-adbc]

via GitHub Sat, 16 Aug 2025 15:28:12 -0700


fwojciec commented on issue #1755:
URL: https://github.com/apache/arrow-adbc/issues/1755#issuecomment-3193941046


   Not sure this is an option for ADBC (likely not, because it comes with at 
least one somewhat significant tradeoff) but maybe something that could be 
opted into via config? It's possible to do schema introspection at query time 
with minimal overhead (benchmarked at ~178 microseconds per query on local 
connections) using PREPARE statements - example Go implementation from one of 
my projects using Go's pgx as the postgres driver:
   
   ```go
   // GetQueryMetadata uses PREPARE to extract column metadata without 
executing the query.
   func (p *Pool) GetQueryMetadata(ctx context.Context, conn *pgxpool.Conn, sql 
string) (*arrow.Schema, []uint32, error) {
       // Generate a unique statement name to avoid collisions in concurrent 
usage
       stmtName := fmt.Sprintf("pgarrow_meta_%p", conn)
   
       sd, err := conn.Conn().Prepare(ctx, stmtName, sql)
       if err != nil {
           return nil, nil, fmt.Errorf("failed to prepare statement for 
metadata discovery: %w", err)
       }
   
       defer func() {
           _, _ = conn.Conn().Exec(ctx, "DEALLOCATE "+stmtName)
       }()
   
       if len(sd.Fields) == 0 {
           return nil, nil, fmt.Errorf("query returned no columns - Arrow 
conversion requires queries that return at least one column")
       }
   
       columns := make([]ColumnInfo, len(sd.Fields))
       fieldOIDs := make([]uint32, len(sd.Fields))
       for i, field := range sd.Fields {
           columns[i] = ColumnInfo{
               Name: field.Name,
               OID:  field.DataTypeOID,
           }
           fieldOIDs[i] = field.DataTypeOID
       }
   
       schema, err := CreateSchema(columns)
       if err != nil {
           return nil, nil, &SchemaError{
               Columns: columns,
               Err:     err,
           }
       }
   
       return schema, fieldOIDs, nil
   }
   ```
   The main tradeoff is that this method doesn't detect column nullability - 
which I've personally found to be a good tradeoff since there's no performance 
benefit to knowing nullability when converting data (as far as I know) and the 
query engine I was using at the time (DuckDB) treats all Arrow data as nullable 
anyway (which arguably also follows Arrow's design philosophy).
   
   Just sharing because it might be something to consider in this context - no 
idea how to write it in C++ though... :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Initial connection to postgres very slow when number of OIDs is large [arrow-adbc]

Reply via email to