OK let me put a proposal here: 1. Permanently ban CHAR for native data source tables, and only keep it for Hive compatibility. It's OK to forget about padding like what Snowflake and MySQL have done. But it's hard for Spark to require consistent behavior about CHAR type in all data sources. Since CHAR type is not that useful nowadays, seems OK to just ban it. Another way is to document that the padding of CHAR type is data source dependent, but it's a bit weird to leave this inconsistency in Spark.
2. Leave VARCHAR unchanged in 3.0 VARCHAR type is so widely used in databases and it's weird if Spark doesn't support it. VARCHAR type is exactly the same as Spark StringType when the length limitation is not hit, and I'm fine to temporarily leave this flaw in 3.0 and users may hit behavior changes when the string values hit the VARCHAR length limitation. 3. Finalize the VARCHAR behavior in 3.1 For now I have 2 ideas: a) Make VARCHAR(x) a first-class data type. This means Spark data sources should support VARCHAR, and CREATE TABLE should fail if a column is VARCHAR type and the underlying data source doesn't support it (e.g. JSON/CSV). Type cast, type coercion, table insertion, etc. should be updated as well. b) Simply document that, the underlying data source may or may not enforce the length limitation of VARCHAR(x). Please let me know if you have different ideas. Thanks, Wenchen On Wed, Mar 18, 2020 at 1:08 AM Michael Armbrust <mich...@databricks.com> wrote: > What I'd oppose is to just ban char for the native data sources, and do >> not have a plan to address this problem systematically. >> > > +1 > > >> Just forget about padding, like what Snowflake and MySQL have done. >> Document that char(x) is just an alias for string. And then move on. Almost >> no work needs to be done... >> > > +1 > >