jcamachor commented on a change in pull request #970: Hive 23100
URL: https://github.com/apache/hive/pull/970#discussion_r409981518
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##########
@@ -972,55 +1005,42 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode expr,
}
}
if (genericUDF instanceof GenericUDFIn) {
-
- T columnDesc = children.get(0);
- List<T> outputOpList = children.subList(1, children.size());
- List<T> inOperands = new ArrayList<>(outputOpList);
- outputOpList.clear();
-
- boolean hasNullValue = false;
- for (T oldChild : inOperands) {
- if (oldChild == null) {
- hasNullValue = true;
- continue;
- }
- T newChild = interpretNodeAsStruct(columnDesc, oldChild);
- if (newChild == null) {
- hasNullValue = true;
- continue;
- }
- outputOpList.add(newChild);
- }
-
- if (hasNullValue) {
- T nullConst =
exprFactory.createConstantExpr(exprFactory.getTypeInfo(columnDesc), null);
- if (outputOpList.size() == 0) {
- // we have found only null values...remove the IN ; it will be
null all the time.
- return nullConst;
+ ListMultimap<TypeInfo, T> expressions = ArrayListMultimap.create();
+ for (int i = 1; i < children.size(); i++) {
+ T columnDesc = children.get(0);
+ T valueDesc = interpretNodeAsConstantStruct(columnDesc,
children.get(i));
+ if (valueDesc == null) {
+ TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+ if (!expressions.containsKey(targetType)) {
+ expressions.put(targetType, columnDesc);
+ }
+ T nullConst = exprFactory.createConstantExpr(targetType, null);
+ expressions.put(targetType, nullConst);
+ } else {
+ TypeInfo targetType = exprFactory.getTypeInfo(valueDesc);
+ if (!expressions.containsKey(targetType)) {
+ expressions.put(targetType, columnDesc);
+ }
+ expressions.put(targetType, valueDesc);
}
- outputOpList.add(nullConst);
}
- if (!ctx.isCBOExecuted()) {
-
- HiveConf conf;
- try {
- conf = Hive.get().getConf();
- } catch (HiveException e) {
- throw new SemanticException(e);
- }
- if (children.size() <= HiveConf.getIntVar(conf,
HiveConf.ConfVars.HIVEOPT_TRANSFORM_IN_MAXNODES)) {
- List<T> orOperands =
exprFactory.rewriteINIntoORFuncCallExpr(children);
- if (orOperands != null) {
- if (orOperands.size() == 1) {
-
orOperands.add(exprFactory.createBooleanConstantExpr(Boolean.FALSE.toString()));
- }
- funcText = "or";
- genericUDF = new GenericUDFOPOr();
- children.clear();
- children.addAll(orOperands);
- }
+ children.clear();
+ List<T> newExprs = new ArrayList<>();
+ int numEntries = expressions.keySet().size();
+ if (numEntries == 1) {
+ children.addAll(expressions.asMap().values().iterator().next());
+ funcText = "in";
Review comment:
I am going to add more information here because I think all this part was
misleading, but here is the basic idea. We split the IN clause into different
IN clauses, one for each different value type. As you know, Hive and Calcite
treat types in IN clauses differently and it was practically impossible to find
some correct implementation with a common part unless this was done. Then I
rely on Calcite to merge the IN clauses if they are type compatible. This seems
to be working fine.
I think this also answers your questions below.
1) We may have multiple values per type (observe it is a multimap). This is
precisely what I want, to group those values by type.
2) In previous version, when IN clause had multiple null values, a single
null value was added. Now I add multiple null values (basically I omitted the
deduplication). I have added the deduplication back, although this can be done
by Calcite.
3) For the twisted part, the initialization needs be done within the
children processing because it depends on the type of the value itself, unless
it is null in which case it depends on the column type.
Could this logic be improved or rewritten? Definitely... But I can tell you
I work hard on this and the problem is that there are two many corner cases and
two different type systems, and it is very easy that even the most minimal
change produces incorrect results in any of them. That is why I came up with
this solution to split IN clause depending on value type.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]