[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604874#comment-17604874
 ] 

ASF GitHub Bot commented on DRILL-8136:
---------------------------------------

vvysotskyi commented on code in PR #2638:
URL: https://github.com/apache/drill/pull/2638#discussion_r971097106


##########
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/ResolverTypePrecedence.java:
##########
@@ -21,148 +21,211 @@
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
+import java.util.TreeSet;
 
 import org.apache.drill.common.types.TypeProtos.MinorType;
+import 
org.apache.drill.shaded.guava.com.google.common.graph.ImmutableValueGraph;
+import org.apache.drill.shaded.guava.com.google.common.graph.ValueGraphBuilder;
 
 public class ResolverTypePrecedence {
 
+  // A weighted directed graph that represents the cost of casting between
+  // pairs of data types. The edge weights represent casting preferences and
+  // it is important to note that only some of these prefereces can be
+  // understood in terms of factors like computational cost or loss of
+  // precision. The others are derived from the expected behaviour of the query
+  // engine in the face of various data types and queries as expressed by the
+  // test suite.
+  //
+  // Bear in mind that this class only establishes which casts will be tried
+  // automatically and how they're ranked. See
+  // {@link org.apache.drill.exec.resolver.TypeCastRules} for listings of which
+  // casts are possible at all.
+  public static final ImmutableValueGraph<MinorType, Float> CAST_GRAPH = 
ValueGraphBuilder
+    .directed()
+    .<MinorType, Float>immutable()
+
+    // null type source vertex (null is castable to any type)
+    .putEdgeValue(MinorType.NULL, MinorType.VARCHAR, 1f)
+    .putEdgeValue(MinorType.NULL, MinorType.BIT, 2f)
+    .putEdgeValue(MinorType.NULL, MinorType.INT, 3f)
+    .putEdgeValue(MinorType.NULL, MinorType.FLOAT4, 4f)
+    .putEdgeValue(MinorType.NULL, MinorType.DECIMAL9, 5f)
+    .putEdgeValue(MinorType.NULL, MinorType.DATE, 6f)
+    .putEdgeValue(MinorType.NULL, MinorType.INTERVALDAY, 7f)
+    .putEdgeValue(MinorType.NULL, MinorType.MONEY, 8f)
+    .putEdgeValue(MinorType.NULL, MinorType.DICT, 9f)
+
+    // bit conversions
+    // prefer to cast VARCHAR to BIT than BIT to numerics
+    .putEdgeValue(MinorType.BIT, MinorType.TINYINT, 10f)
+    .putEdgeValue(MinorType.BIT, MinorType.UINT1, 10f)
+
+    // unsigned int widening
+    .putEdgeValue(MinorType.UINT1, MinorType.UINT2, 1f)
+    .putEdgeValue(MinorType.UINT2, MinorType.UINT4, 1f)
+    .putEdgeValue(MinorType.UINT4, MinorType.UINT8, 1f)
+    .putEdgeValue(MinorType.UINT8, MinorType.VARDECIMAL, 1f)
+    // unsigned int conversions
+    // prefer to cast UINTs to BIGINT over FLOAT4
+    .putEdgeValue(MinorType.UINT4, MinorType.BIGINT, 1f)
+    .putEdgeValue(MinorType.UINT4, MinorType.FLOAT4, 2f)
+    .putEdgeValue(MinorType.UINT8, MinorType.FLOAT4, 2f)
+
+    // int widening
+    .putEdgeValue(MinorType.TINYINT, MinorType.SMALLINT, 1f)
+    .putEdgeValue(MinorType.SMALLINT, MinorType.INT, 1f)
+    .putEdgeValue(MinorType.INT, MinorType.BIGINT, 1f)
+    // int conversions
+    .putEdgeValue(MinorType.BIGINT, MinorType.FLOAT4, 1f)
+    // prefer to cast INTs to FLOATs over VARDECIMALs
+    .putEdgeValue(MinorType.BIGINT, MinorType.VARDECIMAL, 2f)
+
+    // float widening
+    .putEdgeValue(MinorType.FLOAT4, MinorType.FLOAT8, 1f)
+    // float conversion
+    // FLOATs are not currently castable to VARDECIMAL (see TypeCastRules)
+    // but it is not possible to avoid some path between them here since
+    // FLOATs must ultimately be implcitly castable to VARCHAR, and VARCHAR
+    // to VARDECIMAL.
+    // prefer the cast in the opposite direction
+    .putEdgeValue(MinorType.FLOAT8, MinorType.VARDECIMAL, 20f)
+
+    // decimal widening
+    .putEdgeValue(MinorType.DECIMAL9, MinorType.DECIMAL18, 1f)
+    .putEdgeValue(MinorType.DECIMAL18, MinorType.DECIMAL28DENSE, 1f)
+    .putEdgeValue(MinorType.DECIMAL28DENSE, MinorType.DECIMAL28SPARSE, 1f)
+    .putEdgeValue(MinorType.DECIMAL28SPARSE, MinorType.DECIMAL38DENSE, 1f)
+    .putEdgeValue(MinorType.DECIMAL38DENSE, MinorType.DECIMAL38SPARSE, 1f)
+    .putEdgeValue(MinorType.DECIMAL38SPARSE, MinorType.VARDECIMAL, 1f)
+    .putEdgeValue(MinorType.MONEY, MinorType.VARDECIMAL, 1f)
+    // decimal conversions
+    // prefer to cast INTs to VARDECIMALs over VARDECIMALs to FLOATs
+    .putEdgeValue(MinorType.VARDECIMAL, MinorType.FLOAT4, 10f)
+    // prefer the casts in the opposite directions
+    .putEdgeValue(MinorType.VARDECIMAL, MinorType.INT, 11f)
+    .putEdgeValue(MinorType.VARDECIMAL, MinorType.VARCHAR, 12f)
+
+    // interval widening
+    .putEdgeValue(MinorType.INTERVALDAY, MinorType.INTERVALYEAR, 1f)
+    .putEdgeValue(MinorType.INTERVALYEAR, MinorType.INTERVAL, 1f)
+    // interval conversions
+    // prefer the cast in the opposite direction
+    .putEdgeValue(MinorType.INTERVAL, MinorType.VARCHAR, 10f)
+
+    // dict widening
+    .putEdgeValue(MinorType.DICT, MinorType.MAP, 1f)
+
+    // timestamp widening
+    .putEdgeValue(MinorType.DATE, MinorType.TIMESTAMP, 1f)
+    .putEdgeValue(MinorType.TIMESTAMP, MinorType.TIMESTAMPTZ, 1f)
+    .putEdgeValue(MinorType.TIME, MinorType.TIMETZ, 1f)
+    // timestamp conversions
+    // TIMESTAMP casting preference: DATE > TIME > VARCHAR
+    // prefer the casts in the opposite directions
+    .putEdgeValue(MinorType.TIMESTAMP, MinorType.DATE, 10f)
+    .putEdgeValue(MinorType.TIMESTAMP, MinorType.TIME, 11f)
+    .putEdgeValue(MinorType.TIMESTAMPTZ, MinorType.VARCHAR, 20f)
+    .putEdgeValue(MinorType.TIMETZ, MinorType.VARCHAR, 20f)
+
+    // char and binary widening
+    .putEdgeValue(MinorType.FIXEDCHAR, MinorType.VARCHAR, 1f)
+    .putEdgeValue(MinorType.FIXEDBINARY, MinorType.VARBINARY, 1f)
+    // char and binary conversions
+    .putEdgeValue(MinorType.VARCHAR, MinorType.INT, 1f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.FLOAT4, 2f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.VARDECIMAL, 3f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.TIMESTAMP, 4f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.INTERVALDAY, 5f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.BIT, 6f)
+    .putEdgeValue(MinorType.VARCHAR, MinorType.VARBINARY, 7f)
+    .putEdgeValue(MinorType.VARBINARY, MinorType.VARCHAR, 1f)
+
+    // union type sink vertex
+    .putEdgeValue(MinorType.LIST, MinorType.UNION, 1f)
+    .putEdgeValue(MinorType.MAP, MinorType.UNION, 1f)
+    .putEdgeValue(MinorType.VARBINARY, MinorType.UNION, 1f)
+
+    .build();
+
+  /**
+   * Searches the implicit casting graph for the path of least total cost using
+   * Dijkstra's algorithm.

Review Comment:
   Huh, it is good that we don't have negative distances, so it can be used.



##########
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/ResolverTypePrecedence.java:
##########
@@ -21,148 +21,211 @@
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
+import java.util.TreeSet;
 
 import org.apache.drill.common.types.TypeProtos.MinorType;
+import 
org.apache.drill.shaded.guava.com.google.common.graph.ImmutableValueGraph;
+import org.apache.drill.shaded.guava.com.google.common.graph.ValueGraphBuilder;
 
 public class ResolverTypePrecedence {
 
+  // A weighted directed graph that represents the cost of casting between
+  // pairs of data types. The edge weights represent casting preferences and
+  // it is important to note that only some of these prefereces can be
+  // understood in terms of factors like computational cost or loss of
+  // precision. The others are derived from the expected behaviour of the query
+  // engine in the face of various data types and queries as expressed by the
+  // test suite.
+  //
+  // Bear in mind that this class only establishes which casts will be tried
+  // automatically and how they're ranked. See
+  // {@link org.apache.drill.exec.resolver.TypeCastRules} for listings of which
+  // casts are possible at all.
+  public static final ImmutableValueGraph<MinorType, Float> CAST_GRAPH = 
ValueGraphBuilder

Review Comment:
   Thanks for using ValueGraph, code with it looks cleaner!



##########
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/DefaultFunctionResolver.java:
##########
@@ -58,27 +56,20 @@ public DrillFuncHolder getBestMatch(List<DrillFuncHolder> 
methods, FunctionCall
       }
     }
 
-    if (bestcost < 0) {
+    if (bestcost == Float.POSITIVE_INFINITY) {
       //did not find a matched func implementation, either w/ or w/o implicit 
casts
       //TODO: raise exception here?
       return null;
-    } else {
-      if (AssertionUtil.isAssertionsEnabled() && bestMatchAlternatives.size() 
> 0) {
-        /*
-         * There are other alternatives to the best match function which could 
have been selected
-         * Log the possible functions and the chose implementation and raise 
an exception
-         */
-        logger.error("Chosen function impl: " + bestmatch.toString());
-
-        // printing the possible matches
-        logger.error("Printing all the possible functions that could have 
matched: ");
-        for (DrillFuncHolder holder: bestMatchAlternatives) {
-          logger.error(holder.toString());
-        }
+    }
+    if (bestMatchAlternatives.size() > 0) {

Review Comment:
   Drill should throw an error for this case because it means that a randomly 
chosen function could be selected and return inconsistent results.





> Overhaul implict type casting logic
> -----------------------------------
>
>                 Key: DRILL-8136
>                 URL: https://issues.apache.org/jira/browse/DRILL-8136
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Esther Buchwalter
>            Assignee: James Turton
>            Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to