Is the spark SQL implementation open-source?
If it is, the algorithm they use may be inferred from the code.

Mihai
________________________________
From: Cancai Cai <caic68...@gmail.com>
Sent: Monday, April 15, 2024 8:10 AM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Optimize the type conversion of spark array function and map function 
in calcite

Hi, calcite community,

Recently, I am testing the map and array related functions of spark in
calcite. I found that in some cases, spark is a little different from our
understanding of type conversion.

For example

scala>  val df = spark.sql("select map_contains_key(map(1, 'a', 2, 'b'), 2.0)")
val df: org.apache.spark.sql.DataFrame = [map_contains_key(map(1, a,
2, b), 2.0): boolean]

scala> df.show()
+--------------------------------------+
|map_contains_key(map(1, a, 2, b), 2.0)|
+--------------------------------------+
|                                  true|
+--------------------------------------+

Mihai Budiu pointed out that similar processing may be done in Spark,

map_contains_key(map<Double, String>((Double)1, 'a', (Double)2, 'b'), 2.0)

We can't say that Spark is wrong, we should adapt to this situation, so I
think I might add an adjustTypeForMapContainsKey method to perform display
conversion on it, but this situation should not only exist in the
map_contain_keys method, we cannot guarantee map_concat that they are no
similar problems with other related functions. Therefore, we should
discover what common characteristics these functions have in type
conversion, and we should encapsulate them in a unified method instead of
adding a similar adjust method to each function.

I thought I should do this in three steps.

①Test various situations related to the map function and array function in
Spark, and raise jira if it is inconsistent with the spark behavior in
calcite

② Summarize the same characteristics of some functions and find out whether
there is any relationship

③For the same characteristics, use a method to encapsulate the type
conversion。

The above are my personal thoughts. I feel that this may be more conducive
to the maintenance of calcite code.

Finally, thank you for reading

Best wishes,

Cancai Cai

Reply via email to