[ 
https://issues.apache.org/jira/browse/HIVE-27264?focusedWorklogId=858179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858179
 ]

ASF GitHub Bot logged work on HIVE-27264:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Apr/23 10:04
            Start Date: 20/Apr/23 10:04
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #4237:
URL: https://github.com/apache/hive/pull/4237#discussion_r1172371274


##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/rules/TestHivePointLookupOptimizerRule.java:
##########
@@ -348,4 +356,100 @@ public void testRecursionIsNotObstructed() {
         condition.toString());
   }
 
+  @Test
+  public void testSameVarcharLiteralDifferentPrecision() {
+
+    final RexBuilder rexBuilder = relBuilder.getRexBuilder();
+    RelDataType stringType30 = 
rexBuilder.getTypeFactory().createTypeWithCharsetAndCollation(
+            rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR, 30),
+            Charset.forName(ConversionUtil.NATIVE_UTF16_CHARSET_NAME), 
SqlCollation.IMPLICIT);
+    RexNode lita30 = 
rexBuilder.makeLiteral(RexNodeExprFactory.makeHiveUnicodeString("AAA111"), 
stringType30, true);
+    RexNode litb30 = 
rexBuilder.makeLiteral(RexNodeExprFactory.makeHiveUnicodeString("BBB222"), 
stringType30, true);
+
+    RelDataType stringType14 = 
rexBuilder.getTypeFactory().createTypeWithCharsetAndCollation(
+            rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR, 14),
+            Charset.forName(ConversionUtil.NATIVE_UTF16_CHARSET_NAME), 
SqlCollation.IMPLICIT);
+    RexNode lita14 = 
rexBuilder.makeLiteral(RexNodeExprFactory.makeHiveUnicodeString("AAA111"), 
stringType14, true);
+    RexNode litb14 = 
rexBuilder.makeLiteral(RexNodeExprFactory.makeHiveUnicodeString("BBB222"), 
stringType14, true);
+
+    final RelNode basePlan = relBuilder
+          .scan("t")
+          .filter(and(relBuilder,
+                  relBuilder.call(SqlStdOperatorTable.IN, 
relBuilder.field("f2"), lita30, litb30),
+                  relBuilder.call(SqlStdOperatorTable.IN, 
relBuilder.field("f2"), lita14, litb14)))
+          .build();
+
+    planner.setRoot(basePlan);
+    RelNode optimizedRelNode = planner.findBestExp();
+
+    HiveFilter filter = (HiveFilter) optimizedRelNode;
+    RexNode condition = filter.getCondition();
+    System.out.println(condition);
+    assertEquals("IN($1, " +
+                    "_UTF-16LE'AAA111':VARCHAR(30) CHARACTER SET \"UTF-16LE\", 
" +
+                    "_UTF-16LE'BBB222':VARCHAR(30) CHARACTER SET 
\"UTF-16LE\")",

Review Comment:
   Unfortunately in Calcite 1.25 `RexSimplify` returns the input expression so 
it can not recognize literals with same values and type but different precision.
   I also tested a similar expression with Calcite 1.33:
   ```
   AND(OR(=($0, _UTF-16LE'AAA111'), =($0, _UTF-16LE'BBB222')), OR(=($0, 
_UTF-16LE'AAA111'), =($0, _UTF-16LE'BBB222')))
   ```
   and I got
   ```
   SEARCH($0, Sarg[_UTF-16LE'AAA111':VARCHAR(30) CHARACTER SET "UTF-16LE", 
_UTF-16LE'BBB222':VARCHAR(30) CHARACTER SET "UTF-16LE"]:VARCHAR(30) CHARACTER 
SET "UTF-16LE")
   ```
   
   In Calcite 1.33 IN expression with constants is no longer represented by 
`RexCall` but `SEARCH` so I had to transform the original expression to `OR`s 
but the literals has different precision. 
   This time the expression was simplified.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 858179)
    Time Spent: 2h 20m  (was: 2h 10m)

> Literals in conjunction of two IN expression are considered not equals if 
> type precision is different
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27264
>                 URL: https://issues.apache.org/jira/browse/HIVE-27264
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table r_table (
>   string_col varchar(30)
> );
> create table l_table (
>   string_col varchar(14)
> );
> insert into r_table VALUES ('AAA111');
> insert into l_table VALUES ('AAA111');
> SELECT l_table.string_col from l_table, r_table
> WHERE r_table.string_col = l_table.string_col AND l_table.string_col IN 
> ('AAA111', 'BBB222') AND r_table.string_col IN ('AAA111', 'BBB222');
> {code}
> Should give one row
> {code}
> AAA111
> {code}
> but it returns empty rs
> Workaround
> {code}
> set hive.optimize.point.lookup=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to