fwojciec commented on issue #1755:
URL: https://github.com/apache/arrow-adbc/issues/1755#issuecomment-3193941046
Not sure this is an option for ADBC (likely not, because it comes with at
least one somewhat significant tradeoff) but maybe something that could be
opted into via config? It's possible to do schema introspection at query time
with minimal overhead (benchmarked at ~178 microseconds per query on local
connections) using PREPARE statements - example Go implementation from one of
my projects using Go's pgx as the postgres driver:
```go
// GetQueryMetadata uses PREPARE to extract column metadata without
executing the query.
func (p *Pool) GetQueryMetadata(ctx context.Context, conn *pgxpool.Conn, sql
string) (*arrow.Schema, []uint32, error) {
// Generate a unique statement name to avoid collisions in concurrent
usage
stmtName := fmt.Sprintf("pgarrow_meta_%p", conn)
sd, err := conn.Conn().Prepare(ctx, stmtName, sql)
if err != nil {
return nil, nil, fmt.Errorf("failed to prepare statement for
metadata discovery: %w", err)
}
defer func() {
_, _ = conn.Conn().Exec(ctx, "DEALLOCATE "+stmtName)
}()
if len(sd.Fields) == 0 {
return nil, nil, fmt.Errorf("query returned no columns - Arrow
conversion requires queries that return at least one column")
}
columns := make([]ColumnInfo, len(sd.Fields))
fieldOIDs := make([]uint32, len(sd.Fields))
for i, field := range sd.Fields {
columns[i] = ColumnInfo{
Name: field.Name,
OID: field.DataTypeOID,
}
fieldOIDs[i] = field.DataTypeOID
}
schema, err := CreateSchema(columns)
if err != nil {
return nil, nil, &SchemaError{
Columns: columns,
Err: err,
}
}
return schema, fieldOIDs, nil
}
```
The main tradeoff is that this method doesn't detect column nullability -
which I've personally found to be a good tradeoff since there's no performance
benefit to knowing nullability when converting data (as far as I know) and the
query engine I was using at the time (DuckDB) treats all Arrow data as nullable
anyway (which arguably also follows Arrow's design philosophy).
Just sharing because it might be something to consider in this context - no
idea how to write it in C++ though... :/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]