You are joking when you said " informed widely and discussed in many ways twice" right?
This thread doesn't even talk about char/varchar: https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E (Yes it talked about changing the default data source provider, but that's just one of the ways we are exposing this char/varchar issue). On Thu, Mar 19, 2020 at 8:41 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > +1 for Wenchen's suggestion. > > I believe that the difference and effects are informed widely and > discussed in many ways twice. > > First, this was shared on last December. > > "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE > syntax", 2019/12/06 > https:/ / lists. apache. org/ thread. html/ > 493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev. > spark. apache. org%3E ( > https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E > ) > > Second (at this time in this thread), this has been discussed according to > the new community rubric. > > - https:/ / spark. apache. org/ versioning-policy. html ( > https://spark.apache.org/versioning-policy.html ) (Section: "Considerations > When Breaking APIs") > > > Thank you all. > > > Bests, > Dongjoon. > > On Tue, Mar 17, 2020 at 10:41 PM Wenchen Fan < cloud0fan@ gmail. com ( > cloud0...@gmail.com ) > wrote: > > >> OK let me put a proposal here: >> >> >> 1. Permanently ban CHAR for native data source tables, and only keep it >> for Hive compatibility. >> It's OK to forget about padding like what Snowflake and MySQL have done. >> But it's hard for Spark to require consistent behavior about CHAR type in >> all data sources. Since CHAR type is not that useful nowadays, seems OK to >> just ban it. Another way is to document that the padding of CHAR type is >> data source dependent, but it's a bit weird to leave this inconsistency in >> Spark. >> >> >> 2. Leave VARCHAR unchanged in 3.0 >> VARCHAR type is so widely used in databases and it's weird if Spark >> doesn't support it. VARCHAR type is exactly the same as Spark StringType >> when the length limitation is not hit, and I'm fine to temporarily leave >> this flaw in 3.0 and users may hit behavior changes when the string values >> hit the VARCHAR length limitation. >> >> >> 3. Finalize the VARCHAR behavior in 3.1 >> For now I have 2 ideas: >> a) Make VARCHAR(x) a first-class data type. This means Spark data sources >> should support VARCHAR, and CREATE TABLE should fail if a column is >> VARCHAR type and the underlying data source doesn't support it (e.g. >> JSON/CSV). Type cast, type coercion, table insertion, etc. should be >> updated as well. >> b) Simply document that, the underlying data source may or may not enforce >> the length limitation of VARCHAR(x). >> >> >> Please let me know if you have different ideas. >> >> >> Thanks, >> Wenchen >> >> On Wed, Mar 18, 2020 at 1:08 AM Michael Armbrust < michael@ databricks. com >> ( mich...@databricks.com ) > wrote: >> >> >>> >>>> What I'd oppose is to just ban char for the native data sources, and do >>>> not have a plan to address this problem systematically. >>>> >>> >>> >>> >>> +1 >>> >>> >>> >>>> Just forget about padding, like what Snowflake and MySQL have done. >>>> Document that char(x) is just an alias for string. And then move on. >>>> Almost no work needs to be done... >>>> >>> >>> >>> >>> +1 >>> >> >> > >
smime.p7s
Description: S/MIME Cryptographic Signature