[
https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-2249:
------------------------------
Attachment: HIVE-2249.D1383.1.patch
zhiqiu requested code review of "HIVE-2249 [jira] When creating constant
expression for numbers, try to infer type from another comparison operand,
instead of trying to use integer first, and then long and double".
Reviewers: njain, kevinwilfong, heyongqiang, JIRA
[jira] [HIVE-2249] Smart type inference of constants
This patch adds support to smartly infer the constant's type when
encountering the query like "column CMP constant", where CMP could be
any of the comparators supported by Hive. This aims to improving the
performance by moving the type conversion from runtime stage to
compiling stage.
To be more detailed, the smart type inference will happen when the type
of the column is one of the followings:
* TINYINT
* SMALLINT
* INT
* BIGINT
* FLOAT
* DOUBLE
If the type of the columns fits any of the above, the constant on the
other hand side will be converted firstly to the column's type.
* TINYINT => Byte
* SMALLINT => Short
* INT => Integer
* BIGINT => Long
* FLOAT => Float
* DOUBLE => Double
If failing, the constant will then be converted to DOUBLE. If both tries
fail, the constant will be left as what type it is.
One exception is when the column is STRING while the constant is BIGINT.
In this case, we do nothing. Otherwise, the constant will be converted
to DOUBLE.
Other improvements include returning false immediately for the query
like "int_col = not_convertable_double_constant", such as "uid = 1.5".
NOTE:
~130 unit test cases need to be updated due to this diff. All updates
are limited to convert to the plan like "col = 10" to "col = 10.0", and
are carefully checked individually.
TWO test cases failed during the unit testing:
* testCliDriver_insert2_overwrite_partitions
* testCliDriver_ppr_pushdown
When looking into the query as well as the output, the plans generated
were found to be the same while the query results changed. As the
queries in these two cases are simple select queries, maybe the default
sorting criteria was changed unintentionally by this diff or other
diffs.
Task ID: #620808
Blame Rev:
The current code to build constant expression for numbers, here is the code:
try {
v = Double.valueOf(expr.getText());
v = Long.valueOf(expr.getText());
v = Integer.valueOf(expr.getText());
} catch (NumberFormatException e) {
// do nothing here, we will throw an exception in the following block
}
if (v == null) {
throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
.getMsg(expr));
}
return new ExprNodeConstantDesc(v);
The for the case that "WHERE <BIG_INT_COLUMN> = 0", or "WHERE <DOUBLE_COLUMN>
= 0", we always have to do a type conversion when comparing, which is
unnecessary if it is slightly smarter to choose type when creating the constant
expression. We can simply walk one level up the tree, find another comparison
party and use the same type with that one if it is possible. For user's wrong
query like '<INT_COLUMN>=1.1', we can even do more.
TEST PLAN
Run unit tests.
TWO test cases failed during the unit testing:
* testCliDriver_insert2_overwrite_partitions
* testCliDriver_ppr_pushdown
When looking into the query as well as the output, the plans generated
were found to be the same while the query results changed. As the
queries in these two cases are simple select queries, maybe the default
sorting criteria was changed unintentionally by this diff or other
diffs.
Revert Plan:
Tags:
REVISION DETAIL
https://reviews.facebook.net/D1383
AFFECTED FILES
contrib/src/test/results/clientpositive/dboutput.q.out
contrib/src/test/results/clientpositive/serde_typedbytes4.q.out
ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
ql/src/test/results/clientpositive/auto_join0.q.out
ql/src/test/results/clientpositive/auto_join11.q.out
ql/src/test/results/clientpositive/auto_join12.q.out
ql/src/test/results/clientpositive/auto_join13.q.out
ql/src/test/results/clientpositive/auto_join14.q.out
ql/src/test/results/clientpositive/auto_join16.q.out
ql/src/test/results/clientpositive/auto_join20.q.out
ql/src/test/results/clientpositive/auto_join21.q.out
ql/src/test/results/clientpositive/auto_join23.q.out
ql/src/test/results/clientpositive/auto_join27.q.out
ql/src/test/results/clientpositive/auto_join28.q.out
ql/src/test/results/clientpositive/auto_join29.q.out
ql/src/test/results/clientpositive/auto_join4.q.out
ql/src/test/results/clientpositive/auto_join5.q.out
ql/src/test/results/clientpositive/auto_join6.q.out
ql/src/test/results/clientpositive/auto_join7.q.out
ql/src/test/results/clientpositive/auto_join8.q.out
ql/src/test/results/clientpositive/cast1.q.out
ql/src/test/results/clientpositive/cluster.q.out
ql/src/test/results/clientpositive/create_view.q.out
ql/src/test/results/clientpositive/groupby_multi_single_reducer.q.out
ql/src/test/results/clientpositive/having.q.out
ql/src/test/results/clientpositive/index_auto.q.out
ql/src/test/results/clientpositive/index_auto_empty.q.out
ql/src/test/results/clientpositive/index_auto_file_format.q.out
ql/src/test/results/clientpositive/index_auto_mult_tables.q.out
ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out
ql/src/test/results/clientpositive/index_auto_multiple.q.out
ql/src/test/results/clientpositive/index_auto_partitioned.q.out
ql/src/test/results/clientpositive/index_auto_self_join.q.out
ql/src/test/results/clientpositive/index_auto_unused.q.out
ql/src/test/results/clientpositive/index_auto_update.q.out
ql/src/test/results/clientpositive/index_bitmap3.q.out
ql/src/test/results/clientpositive/index_bitmap_auto.q.out
ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out
ql/src/test/results/clientpositive/index_bitmap_compression.q.out
ql/src/test/results/clientpositive/index_compression.q.out
ql/src/test/results/clientpositive/index_stale.q.out
ql/src/test/results/clientpositive/index_stale_partitioned.q.out
ql/src/test/results/clientpositive/input11.q.out
ql/src/test/results/clientpositive/input11_limit.q.out
ql/src/test/results/clientpositive/input12.q.out
ql/src/test/results/clientpositive/input13.q.out
ql/src/test/results/clientpositive/input14.q.out
ql/src/test/results/clientpositive/input14_limit.q.out
ql/src/test/results/clientpositive/input18.q.out
ql/src/test/results/clientpositive/input1_limit.q.out
ql/src/test/results/clientpositive/input2_limit.q.out
ql/src/test/results/clientpositive/input42.q.out
ql/src/test/results/clientpositive/input_part1.q.out
ql/src/test/results/clientpositive/input_part2.q.out
ql/src/test/results/clientpositive/input_part5.q.out
ql/src/test/results/clientpositive/input_part7.q.out
ql/src/test/results/clientpositive/join0.q.out
ql/src/test/results/clientpositive/join11.q.out
ql/src/test/results/clientpositive/join12.q.out
ql/src/test/results/clientpositive/join13.q.out
ql/src/test/results/clientpositive/join14.q.out
ql/src/test/results/clientpositive/join16.q.out
ql/src/test/results/clientpositive/join20.q.out
ql/src/test/results/clientpositive/join21.q.out
ql/src/test/results/clientpositive/join23.q.out
ql/src/test/results/clientpositive/join34.q.out
ql/src/test/results/clientpositive/join35.q.out
ql/src/test/results/clientpositive/join38.q.out
ql/src/test/results/clientpositive/join39.q.out
ql/src/test/results/clientpositive/join4.q.out
ql/src/test/results/clientpositive/join40.q.out
ql/src/test/results/clientpositive/join5.q.out
ql/src/test/results/clientpositive/join6.q.out
ql/src/test/results/clientpositive/join7.q.out
ql/src/test/results/clientpositive/join8.q.out
ql/src/test/results/clientpositive/load_dyn_part13.q.out
ql/src/test/results/clientpositive/louter_join_ppr.q.out
ql/src/test/results/clientpositive/multi_insert.q.out
ql/src/test/results/clientpositive/no_hooks.q.out
ql/src/test/results/clientpositive/noalias_subq1.q.out
ql/src/test/results/clientpositive/notable_alias1.q.out
ql/src/test/results/clientpositive/notable_alias2.q.out
ql/src/test/results/clientpositive/nullgroup.q.out
ql/src/test/results/clientpositive/nullgroup2.q.out
ql/src/test/results/clientpositive/nullgroup4.q.out
ql/src/test/results/clientpositive/nullgroup4_multi_distinct.q.out
ql/src/test/results/clientpositive/order2.q.out
ql/src/test/results/clientpositive/outer_join_ppr.q.out
ql/src/test/results/clientpositive/pcr.q.out
ql/src/test/results/clientpositive/ppd_clusterby.q.out
ql/src/test/results/clientpositive/ppd_multi_insert.q.out
ql/src/test/results/clientpositive/ppd_outer_join1.q.out
ql/src/test/results/clientpositive/ppd_transform.q.out
ql/src/test/results/clientpositive/ppd_udf_col.q.out
ql/src/test/results/clientpositive/ppr_pushdown3.q.out
ql/src/test/results/clientpositive/quote1.q.out
ql/src/test/results/clientpositive/rand_partitionpruner3.q.out
ql/src/test/results/clientpositive/rcfile_null_value.q.out
ql/src/test/results/clientpositive/regex_col.q.out
ql/src/test/results/clientpositive/regexp_extract.q.out
ql/src/test/results/clientpositive/router_join_ppr.q.out
ql/src/test/results/clientpositive/semijoin.q.out
ql/src/test/results/clientpositive/set_processor_namespaces.q.out
ql/src/test/results/clientpositive/skewjoin.q.out
ql/src/test/results/clientpositive/subq.q.out
ql/src/test/results/clientpositive/subq2.q.out
ql/src/test/results/clientpositive/transform_ppr1.q.out
ql/src/test/results/clientpositive/transform_ppr2.q.out
ql/src/test/results/clientpositive/udf1.q.out
ql/src/test/results/clientpositive/udf9.q.out
ql/src/test/results/clientpositive/udf_10_trims.q.out
ql/src/test/results/clientpositive/udf_hour.q.out
ql/src/test/results/clientpositive/udf_like.q.out
ql/src/test/results/clientpositive/udf_lower.q.out
ql/src/test/results/clientpositive/udf_minute.q.out
ql/src/test/results/clientpositive/udf_parse_url.q.out
ql/src/test/results/clientpositive/udf_second.q.out
ql/src/test/results/clientpositive/udf_union.q.out
ql/src/test/results/clientpositive/union.q.out
ql/src/test/results/clientpositive/union20.q.out
ql/src/test/results/clientpositive/union22.q.out
ql/src/test/results/clientpositive/union24.q.out
ql/src/test/results/clientpositive/union_ppr.q.out
ql/src/test/results/compiler/plan/cast1.q.xml
ql/src/test/results/compiler/plan/input1.q.xml
ql/src/test/results/compiler/plan/input2.q.xml
ql/src/test/results/compiler/plan/input3.q.xml
ql/src/test/results/compiler/plan/input4.q.xml
ql/src/test/results/compiler/plan/input_part1.q.xml
ql/src/test/results/compiler/plan/join4.q.xml
ql/src/test/results/compiler/plan/join5.q.xml
ql/src/test/results/compiler/plan/join6.q.xml
ql/src/test/results/compiler/plan/join7.q.xml
ql/src/test/results/compiler/plan/join8.q.xml
ql/src/test/results/compiler/plan/subq.q.xml
ql/src/test/results/compiler/plan/udf1.q.xml
ql/src/test/results/compiler/plan/union.q.xml
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/2901/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> When creating constant expression for numbers, try to infer type from another
> comparison operand, instead of trying to use integer first, and then long and
> double
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-2249
> URL: https://issues.apache.org/jira/browse/HIVE-2249
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Joseph Barillari
> Attachments: HIVE-2249.1.patch.txt, HIVE-2249.2.patch.txt,
> HIVE-2249.D1383.1.patch
>
>
> The current code to build constant expression for numbers, here is the code:
> try {
> v = Double.valueOf(expr.getText());
> v = Long.valueOf(expr.getText());
> v = Integer.valueOf(expr.getText());
> } catch (NumberFormatException e) {
> // do nothing here, we will throw an exception in the following block
> }
> if (v == null) {
> throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
> .getMsg(expr));
> }
> return new ExprNodeConstantDesc(v);
> The for the case that "WHERE <BIG_INT_COLUMN> = 0", or "WHERE <DOUBLE_COLUMN>
> = 0", we always have to do a type conversion when comparing, which is
> unnecessary if it is slightly smarter to choose type when creating the
> constant expression. We can simply walk one level up the tree, find another
> comparison party and use the same type with that one if it is possible. For
> user's wrong query like '<INT_COLUMN>=1.1', we can even do more.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira