Hi everyone, I’m currently working on new table DDL statements for v2 tables. For context, the new logical plans for DataSourceV2 require a catalog interface so that Spark can create tables for operations like CTAS. The proposed TableCatalog API also includes an API for altering those tables so we can make ALTER TABLE statements work. I’m implementing those DDL statements, which will make it into upstream Spark when the TableCatalog PR is merged.
Since I’m adding new SQL statements that don’t yet exist in Spark, I want to make sure that the syntax I’m using in our branch will match the syntax we add to Spark later. I’m basing this proposed syntax on PostgreSQL <https://www.postgresql.org/docs/current/static/ddl-alter.html>. - *Update data type*: ALTER TABLE tableIdentifier ALTER COLUMN qualifiedName TYPE dataType. - *Rename column*: ALTER TABLE tableIdentifier RENAME COLUMN qualifiedName TO qualifiedName - *Drop column*: ALTER TABLE tableIdentifier DROP (COLUMN | COLUMNS) qualifiedNameList A few notes: - Using qualifiedName in these rules allows updating nested types, like point.x. - Updates and renames can only alter one column, but drop can drop a list. - Rename can’t move types and will validate that if the TO name is qualified, that the prefix matches the original field. - I’m also changing ADD COLUMN to support adding fields to nested columns by using qualifiedName instead of identifier. Please reply to this thread if you have suggestions based on a different SQL engine or want this syntax to be different for another reason. Thanks! rb -- Ryan Blue Software Engineer Netflix