−User char barely showed up (honestly negligible). I was comparing select vs select.
On Mon, Mar 16, 2020 at 5:40 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Ur, are you comparing the number of SELECT statement with TRIM and CREATE > statements with `CHAR`? > > > I looked up our usage logs (sorry I can't share this publicly) and trim > has at least four orders of magnitude higher usage than char. > > We need to discuss more about what to do. This thread is what I expected > exactly. :) > > > BTW I'm not opposing us sticking to SQL standard (I'm in general for > it). I was merely pointing out that if we deviate away from SQL standard > in any way we are considered "wrong" or "incorrect". That argument itself > is flawed when plenty of other popular database systems also deviate away > from the standard on this specific behavior. > > > Bests, > Dongjoon. > > On Mon, Mar 16, 2020 at 5:35 PM Reynold Xin < rxin@ databricks. com ( > r...@databricks.com ) > wrote: > > >> BTW I'm not opposing us sticking to SQL standard (I'm in general for it). >> I was merely pointing out that if we deviate away from SQL standard in any >> way we are considered "wrong" or "incorrect". That argument itself is >> flawed when plenty of other popular database systems also deviate away >> from the standard on this specific behavior. >> >> >> >> >> >> >> >> >> On Mon, Mar 16, 2020 at 5:29 PM, Reynold Xin < rxin@ databricks. com ( >> r...@databricks.com ) > wrote: >> >>> I looked up our usage logs (sorry I can't share this publicly) and trim >>> has at least four orders of magnitude higher usage than char. >>> >>> >>> >>> >>> On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun < dongjoon. hyun@ gmail. com >>> ( dongjoon.h...@gmail.com ) > wrote: >>> >>>> Thank you, Stephen and Reynold. >>>> >>>> >>>> To Reynold. >>>> >>>> >>>> The way I see the following is a little different. >>>> >>>> >>>> > CHAR is an undocumented data type without clearly defined >>>> semantics. >>>> >>>> Let me describe in Apache Spark User's View point. >>>> >>>> >>>> Apache Spark started to claim `HiveContext` (and `hql/hiveql` function) at >>>> Apache Spark 1.x without much documentation. In addition, there still >>>> exists an effort which is trying to keep it in 3.0.0 age. >>>> >>>> https:/ / issues. apache. org/ jira/ browse/ SPARK-31088 ( >>>> https://issues.apache.org/jira/browse/SPARK-31088 ) >>>> Add back HiveContext and createExternalTable >>>> >>>> Historically, we tried to make many SQL-based customer migrate their >>>> workloads from Apache Hive into Apache Spark through `HiveContext`. >>>> >>>> Although Apache Spark didn't have a good document about the inconsistent >>>> behavior among its data sources, Apache Hive has been providing its >>>> documentation and many customers rely the behavior. >>>> >>>> - https:/ / cwiki. apache. org/ confluence/ display/ Hive/ >>>> LanguageManual+Types >>>> ( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types ) >>>> >>>> At that time, frequently in on-prem Hadoop clusters by well-known vendors, >>>> many existing huge tables were created by Apache Hive, not Apache Spark. >>>> And, Apache Spark is used for boosting SQL performance with its *caching*. >>>> This was true because Apache Spark was added into the Hadoop-vendor >>>> products later than Apache Hive. >>>> >>>> >>>> Until the turning point at Apache Spark 2.0, we tried to catch up more >>>> features to be consistent at least with Hive tables in Apache Hive and >>>> Apache Spark because two SQL engines share the same tables. >>>> >>>> For the following, technically, while Apache Hive doesn't changed its >>>> existing behavior in this part, Apache Spark evolves inevitably by moving >>>> away from the original Apache Spark old behaviors one-by-one. >>>> >>>> >>>> > the value is already fucked up >>>> >>>> >>>> The following is the change log. >>>> >>>> - When we switched the default value of `convertMetastoreParquet`. >>>> (at Apache Spark 1.2) >>>> - When we switched the default value of `convertMetastoreOrc` (at >>>> Apache Spark 2.4) >>>> - When we switched `CREATE TABLE` itself. (Change `TEXT` table to >>>> `PARQUET` table at Apache Spark 3.0) >>>> >>>> To sum up, this has been a well-known issue in the community and among the >>>> customers. >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> On Mon, Mar 16, 2020 at 5:24 PM Stephen Coy < scoy@ infomedia. com. au ( >>>> s...@infomedia.com.au ) > wrote: >>>> >>>> >>>>> Hi there, >>>>> >>>>> >>>>> I’m kind of new around here, but I have had experience with all of all the >>>>> so called “big iron” databases such as Oracle, IBM DB2 and Microsoft SQL >>>>> Server as well as Postgresql. >>>>> >>>>> >>>>> They all support the notion of “ANSI padding” for CHAR columns - which >>>>> means that such columns are always space padded, and they default to >>>>> having this enabled (for ANSI compliance). >>>>> >>>>> >>>>> MySQL also supports it, but it defaults to leaving it disabled for >>>>> historical reasons not unlike what we have here. >>>>> >>>>> >>>>> In my opinion we should push toward standards compliance where possible >>>>> and then document where it cannot work. >>>>> >>>>> >>>>> If users don’t like the padding on CHAR columns then they should change to >>>>> VARCHAR - I believe that was its purpose in the first place, and it does >>>>> not dictate any sort of “padding". >>>>> >>>>> >>>>> I can see why you might “ban” the use of CHAR columns where they cannot be >>>>> consistently supported, but VARCHAR is a different animal and I would >>>>> expect it to work consistently everywhere. >>>>> >>>>> >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> Steve C >>>>> >>>>> >>>>>> On 17 Mar 2020, at 10:01 am, Dongjoon Hyun < dongjoon. hyun@ gmail. com ( >>>>>> dongjoon.h...@gmail.com ) > wrote: >>>>>> >>>>>> Hi, Reynold. >>>>>> (And +Michael Armbrust) >>>>>> >>>>>> >>>>>> If you think so, do you think it's okay that we change the return value >>>>>> silently? Then, I'm wondering why we reverted `TRIM` functions then? >>>>>> >>>>>> >>>>>> > Are we sure "not padding" is "incorrect"? >>>>>> >>>>>> >>>>>> >>>>>> Bests, >>>>>> Dongjoon. >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta < gourav. sengupta@ >>>>>> gmail. >>>>>> com ( gourav.sengu...@gmail.com ) > wrote: >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> 100% agree with Reynold. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Gourav Sengupta >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 16, 2020 at 3:31 AM Reynold Xin < rxin@ databricks. com ( >>>>>>> r...@databricks.com ) > wrote: >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Are we sure "not padding" is "incorrect"? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I don't know whether ANSI SQL actually requires padding, but plenty of >>>>>>>> databases don't actually pad. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https:/ / docs. snowflake. net/ manuals/ sql-reference/ >>>>>>>> data-types-text. html >>>>>>>> ( >>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https:%2F%2Fdocs.snowflake.net%2Fmanuals%2Fsql-reference%2Fdata-types-text.html%23:~:text%3DCHAR%2520%252C%2520CHARACTER%2C(1)%2520is%2520the%2520default.%26text%3DSnowflake%2520currently%2520deviates%2520from%2520common%2Cspace-padded%2520at%2520the%2520end.&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062044368&sdata=BvnZTTPTZBAi8oGWIvJk2fC%2FYSgdvq%2BAxtOj0nVzufk%3D&reserved=0 >>>>>>>> ) : "Snowflake currently deviates from common CHAR semantics in that >>>>>>>> strings shorter than the maximum length are not space-padded at the >>>>>>>> end." >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> MySQL: https:/ / stackoverflow. com/ questions/ 53528645/ >>>>>>>> why-char-dont-have-padding-in-mysql >>>>>>>> ( >>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F53528645%2Fwhy-char-dont-have-padding-in-mysql&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062044368&sdata=3OGLht%2Fa28GcKhAGwJPXIR%2BMODiIwXGVuNuResZqwXM%3D&reserved=0 >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Mar 15, 2020 at 7:02 PM, Dongjoon Hyun < dongjoon. hyun@ >>>>>>>> gmail. com >>>>>>>> ( dongjoon.h...@gmail.com ) > wrote: >>>>>>>> >>>>>>>>> Hi, Reynold. >>>>>>>>> >>>>>>>>> >>>>>>>>> Please see the following for the context. >>>>>>>>> >>>>>>>>> >>>>>>>>> https:/ / issues. apache. org/ jira/ browse/ SPARK-31136 ( >>>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-31136&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062054364&sdata=pWQ9QhfVY4Uzyc8oIJ1QONQ0zOBAQ2DGSemyBj%2BvFeM%3D&reserved=0 >>>>>>>>> ) >>>>>>>>> "Revert SPARK-30098 Use default datasource as provider for CREATE >>>>>>>>> TABLE >>>>>>>>> syntax" >>>>>>>>> >>>>>>>>> >>>>>>>>> I raised the above issue according to the new rubric, and the banning >>>>>>>>> was >>>>>>>>> the proposed alternative to reduce the potential issue. >>>>>>>>> >>>>>>>>> >>>>>>>>> Please give us your opinion since it's still PR. >>>>>>>>> >>>>>>>>> >>>>>>>>> Bests, >>>>>>>>> Dongjoon. >>>>>>>>> >>>>>>>>> On Sat, Mar 14, 2020 at 17:54 Reynold Xin < rxin@ databricks. com ( >>>>>>>>> r...@databricks.com ) > wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> I don’t understand this change. Wouldn’t this “ban” confuse the hell >>>>>>>>>> out >>>>>>>>>> of both new and old users? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For old users, their old code that was working for char(3) would now >>>>>>>>>> stop >>>>>>>>>> working. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For new users, depending on whether the underlying metastore char(3) >>>>>>>>>> is >>>>>>>>>> either supported but different from ansi Sql (which is not that big >>>>>>>>>> of a >>>>>>>>>> deal if we explain it) or not supported. >>>>>>>>>> >>>>>>>>>> On Sat, Mar 14, 2020 at 3:51 PM Dongjoon Hyun < dongjoon. hyun@ >>>>>>>>>> gmail. com >>>>>>>>>> ( dongjoon.h...@gmail.com ) > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi, All. >>>>>>>>>>> >>>>>>>>>>> Apache Spark has been suffered from a known consistency issue on >>>>>>>>>>> `CHAR` >>>>>>>>>>> type behavior among its usages and configurations. However, the >>>>>>>>>>> evolution >>>>>>>>>>> direction has been gradually moving forward to be consistent inside >>>>>>>>>>> Apache >>>>>>>>>>> Spark because we don't have `CHAR` offically. The following is the >>>>>>>>>>> summary. >>>>>>>>>>> >>>>>>>>>>> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different >>>>>>>>>>> result. >>>>>>>>>>> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback >>>>>>>>>>> to >>>>>>>>>>> Hive behavior.) >>>>>>>>>>> >>>>>>>>>>> spark-sql> CREATE TABLE t1(a CHAR(3)); >>>>>>>>>>> spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; >>>>>>>>>>> spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; >>>>>>>>>>> >>>>>>>>>>> spark-sql> INSERT INTO TABLE t1 SELECT 'a '; >>>>>>>>>>> spark-sql> INSERT INTO TABLE t2 SELECT 'a '; >>>>>>>>>>> spark-sql> INSERT INTO TABLE t3 SELECT 'a '; >>>>>>>>>>> >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>>>>>>>> a 3 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>>>>>>>> a 3 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>>>>>>>> a 2 >>>>>>>>>>> >>>>>>>>>>> Since 2.4.0, `STORED AS ORC` became consistent. >>>>>>>>>>> (`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to >>>>>>>>>>> Hive >>>>>>>>>>> behavior.) >>>>>>>>>>> >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>>>>>>>> a 3 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>>>>>>>> a 2 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>>>>>>>> a 2 >>>>>>>>>>> >>>>>>>>>>> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) >>>>>>>>>>> became >>>>>>>>>>> consistent. >>>>>>>>>>> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a >>>>>>>>>>> fallback to Hive behavior.) >>>>>>>>>>> >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>>>>>>>> a 2 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>>>>>>>> a 2 >>>>>>>>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>>>>>>>> a 2 >>>>>>>>>>> >>>>>>>>>>> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type >>>>>>>>>>> in the >>>>>>>>>>> following syntax to be safe. >>>>>>>>>>> >>>>>>>>>>> CREATE TABLE t(a CHAR(3)); >>>>>>>>>>> https:/ / github. com/ apache/ spark/ pull/ 27902 ( >>>>>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F27902&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062054364&sdata=lhwUP5TcTtaO%2BLUTmx%2BPTjT0ASXPrQ7oKLL0N6EG0Ug%3D&reserved=0 >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> This email is sent out to inform you based on the new policy we >>>>>>>>>>> voted. >>>>>>>>>>> The recommendation is always using Apache Spark's native type >>>>>>>>>>> `String`. >>>>>>>>>>> >>>>>>>>>>> Bests, >>>>>>>>>>> Dongjoon. >>>>>>>>>>> >>>>>>>>>>> References: >>>>>>>>>>> 1. "CHAR implementation?", 2017/09/15 >>>>>>>>>>> https:/ / lists. apache. org/ thread. html/ >>>>>>>>>>> 96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev. >>>>>>>>>>> spark. apache. org%3E ( >>>>>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2F96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%2540%253Cdev.spark.apache.org%253E&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062064358&sdata=6hkno6zKTkcIrO%2FJo4hTYihsYvNynMuWcxhzL0fZR68%3D&reserved=0 >>>>>>>>>>> ) >>>>>>>>>>> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE >>>>>>>>>>> TABLE >>>>>>>>>>> syntax", 2019/12/06 >>>>>>>>>>> https:/ / lists. apache. org/ thread. html/ >>>>>>>>>>> 493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev. >>>>>>>>>>> spark. apache. org%3E ( >>>>>>>>>>> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2F493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%2540%253Cdev.spark.apache.org%253E&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062064358&sdata=QJnEU3mvUJff53Gw8F%2FAbxzd%2F8ZA1hhuoQwicX4ZXyI%3D&reserved=0 >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> This email contains confidential information of and is the copyright of >>>>> Infomedia. It must not be forwarded, amended or disclosed without consent >>>>> of the sender. If you received this message by mistake, please advise the >>>>> sender and delete all copies. Security of transmission on the internet >>>>> cannot be guaranteed, could be infected, intercepted, or corrupted and you >>>>> should ensure you have suitable antivirus protection in place. By sending >>>>> us your or any third party personal details, you consent to (or confirm >>>>> you have obtained consent from such third parties) to Infomedia’s privacy >>>>> policy. http:/ / www. infomedia. com. au/ privacy-policy/ ( >>>>> http://www.infomedia.com.au/privacy-policy/ ) >>>>> >>>> >>>> >>> >>> >> >> > >
smime.p7s
Description: S/MIME Cryptographic Signature