Hi all,

I'd like to propose adding four new SqlInfo codes to FlightSql.proto to
fill gaps in dialect metadata that clients need when compiling SQL
per-backend:

- SQL_SUPPORTED_LIMIT_OFFSET (577) — row-limit / offset grammar
(LIMIT/OFFSET, OFFSET…FETCH, TOP)
- SQL_SUPPORTED_NULLS_ORDERING (578) — explicit NULLS FIRST / NULLS LAST
support in ORDER BY (distinct from the existing SQL_NULL_ORDERING (507),
which reports the server's *default* null ordering)
- SQL_SUPPORTED_BOOLEAN_LITERAL (579) — accepted boolean literal forms
(TRUE/FALSE, 1/0)
- SQL_SUPPORTED_DATETIME_LITERAL (580) — accepted date/time/timestamp
literal forms (ANSI DATE '…' keyword vs. bare quoted string)

The goal here is intentionally narrow to give clients just enough dialect
metadata to emit correct SQL for common pushdown operations (predicate
pushdown, projection pushdown, LIMIT/OFFSET, ORDER BY). It is explicitly
not an attempt to describe enough of each dialect to support
general-purpose SQL generation, Substrait is probably the right long-term
answer for engines that need to push arbitrary plans across backends. These
codes are a pragmatic solution for the much smaller surface area that
pushdown requires.

All four are int32 bitmasks (not scalar enums), following the existing
SQL_SUPPORTED_GROUP_BY / SupportedSqlGrammar convention — dialects
frequently accept multiple forms (e.g. PostgreSQL supports both
LIMIT/OFFSET and OFFSET/FETCH; MySQL accepts both TRUE/FALSE and 1/0). The
accompanying enums are intentionally minimal — just enough for current use
cases.

The immediate motivation is enabling the ADBC data source in Spark (
apache/spark#54603 <https://github.com/apache/spark/issues/54603>) without
hardcoded per-dialect configuration in Spark code, the way the JDBC source
does today. Since ADBC reuses Flight SQL's SqlInfo codes, the change
applies to both.

- Issue: https://github.com/apache/arrow/issues/49792
- PR: https://github.com/apache/arrow/pull/49796

Thanks,
Tornike

Reply via email to