Hi all,
A while back, someone raised on this list
(https://lists.apache.org/thread/35zzzh2jgorhx7q2xksp7rwxnt6gl2zx) that
once Polaris is bootstrapped, simple operational questions ("how many
tables in
this namespace?", "how many snapshots?", "any small files?")
force you to switch to Spark/Trino/pyiceberg and write a script.
It gave me an idea for a standalone SQL
shell (ANTLR grammar + REST catalog client, ships as a shadow jar).
Code:

https://github.com/bbejeck/polaris/tree/add-sql-module/extensions/sql-engine
End-to-end demo (docker-compose + MinIO, runs locally):

https://github.com/bbejeck/polaris/tree/add-sql-module/extensions/sql-engine/demo
It adds Iceberg-aware statements like
SHOW TABLES, DESCRIBE STATS, SHOW TABLE LOCATION/POLICIES,
DIAGNOSE TABLE, and EXPLAIN so you can poke at namespaces,
snapshots, small-file diagnostics, etc., without firing up Spark or Trino.
The demo above spins up Polaris + MinIO, seeds three Iceberg tables, and
lets you try
all of the statements above in a few minutes.
A few things worth stating upfront, because the "yet another SQL
dialect" worry is real:
 - Not trying to compete with Spark/Trino/Doris SQL
 - The SELECT support is "peek at a table from the shell", not
   "run analytics".
 - Dialect baseline I'd propose: small SQL-92 read-only subset
   plus a handful of named Polaris extension statements
   (DESCRIBE STATS, DIAGNOSE TABLE, etc.). Easy to maintain,
   intentionally orthogonal to the engines.

If anyone thinks this could be useful, I'd love to discuss the next steps.
I'm new to the Polaris community, but I know in Apache Kafka, something
like this would require a KIP,
so I'm also willing to do a more formal design doc.
Thanks,
— Bill Bejeck

Reply via email to