[
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217539#comment-16217539
]
Matt McCline edited comment on HIVE-17433 at 10/24/17 8:11 PM:
---------------------------------------------------------------
Known Wrong Vectorization Results on Master:
HIVE-17893: Vectorization: Wrong results for vector_udf3.q
HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q
HIVE-17890: Vectorization: Wrong results for vectorized_case.q
HIVE-17889: Vectorization: Wrong results for vectorization_15.q
HIVE-17863: Vectorization: Two Q files produce wrong PTF query results
HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q
HIVE-16919: Vectorization: vectorization_short_regress.q has query result
differences with non-vectorized run. Vectorized unary function broken?
HIVE-17895: Vectorization: Wrong results for schema_evol_text_vec_table.q (LLAP)
HIVE-17894: Vectorization: Wrong results for dynpart_sort_opt_vectorization.q
(LLAP)
was (Author: mmccline):
Known Wrong Vectorization Results on Master:
HIVE-17893: Vectorization: Wrong results for vector_udf3.q
HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q
HIVE-17890: Vectorization: Wrong results for vectorized_case.q
HIVE-17889: Vectorization: Wrong results for vectorization_15.q
HIVE-17863: Vectorization: Two Q files produce wrong PTF query results
HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q
> Vectorization: Support Decimal64 in Hive Query Engine
> -----------------------------------------------------
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch,
> HIVE-17433.05.patch
>
>
> Provide partial support for Decimal64 within Hive. By partial I mean that
> our current decimal has a large surface area of features (rounding, multiply,
> divide, remainder, power, big precision, and many more) but only a small
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a
> 64-bit long we are calling Decimal64 . Just as we optimize row-mode
> execution engine hotspots by selectively adding new vectorization code, we
> can treat the current decimal as the full featured one and add additional
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text
> input format and uses some new Decimal64 vectorized classes for comparison,
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min,
> max.
> The patch also supports a new annotation that can mark a
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So,
> in separate work those other formats such as ORC, PARQUET, etc can be done in
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of
> DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being
> used, the input format can fill that column vector with decimal64 longs
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable
> hive.vectorized.input.format.supports.enabled that has a string list of
> supported features. The default will start as "decimal_64". It can be
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
> Filter Vectorization:
> className: VectorFilterOperator
> native: true
> predicateExpression:
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 20000000)(children:
> Decimal64ColSubtractDecimal64Scalar(col 0, val 10000000,
> outputDecimal64AbsMax 99999999999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
> predicate: ((key - 100) < 200) (type: boolean)
> ...
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)