[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

Hive QA (JIRA) Fri, 27 Oct 2017 03:39:39 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222137#comment-16222137
 ]


Hive QA commented on HIVE-17433:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894263/HIVE-17433.08.patch

{color:green}SUCCESS:{color} +1 due to 46 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 11327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=42)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin1]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin3]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=134)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorDateExpressions.testVectorUDFWeekOfYear
 (batchId=276)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorBin
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorHex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLike
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeMultiByte
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikePatternType
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeRandomized
 (batchId=277)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testAggregateOnUDF 
(batchId=273)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7500/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 37 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894263 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -----------------------------------------------------
>
>                 Key: HIVE-17433
>                 URL: https://issues.apache.org/jira/browse/HIVE-17433
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 .  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
>             Filter Operator
>               Filter Vectorization:
>                   className: VectorFilterOperator
>                   native: true
>                   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 20000000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 10000000, 
> outputDecimal64AbsMax 99999999999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>               predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

Reply via email to