[
https://issues.apache.org/jira/browse/HIVE-20523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625287#comment-16625287
]
Hive QA commented on HIVE-20523:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12940982/HIVE-20523.3.patch
{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 90 failed/errored test(s), 14993 tests
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out)
(batchId=194)
[druidmini_masking.q,druidmini_test1.q,druidkafkamini_basic.q,druidmini_joins.q,druid_timestamptz.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
(batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning]
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_analyze]
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_array_null_element]
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_complex_types_vectorization]
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_join]
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_map_null]
(batchId=90)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_map_type_vectorization]
(batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde]
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_read_backward_compatible_files]
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_schema_evolution]
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_struct_type_vectorization]
(batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_non_dictionary_encoding_vectorization]
(batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_vectorization]
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0]
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_10]
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_11]
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_12]
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_13]
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_14]
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_15]
(batchId=92)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_16]
(batchId=87)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_17]
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_1]
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_2]
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_3]
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_4]
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_5]
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_6]
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_7]
(batchId=90)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_8]
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_9]
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_decimal_date]
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_div0]
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_limit]
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_nested_udf]
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_not]
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_offset_limit]
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_part]
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_part_project]
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_part_varchar]
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_pushdown]
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_numeric_overflows]
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
(batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[strict_managed_tables_sysdb]
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb]
(batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
(batchId=178)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet]
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
(batchId=187)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_join]
(batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_0]
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_10]
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_11]
(batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_12]
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_13]
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_14]
(batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_15]
(batchId=149)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_16]
(batchId=147)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_17]
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_1]
(batchId=113)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_2]
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_3]
(batchId=144)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_4]
(batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_5]
(batchId=141)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_6]
(batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_7]
(batchId=148)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_8]
(batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_9]
(batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_decimal_date]
(batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_div0]
(batchId=145)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_limit]
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_nested_udf]
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_not]
(batchId=145)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_offset_limit]
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_part]
(batchId=142)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_part_project]
(batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_part_varchar]
(batchId=142)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_pushdown]
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes]
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
(batchId=129)
org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe
(batchId=286)
org.apache.hive.hcatalog.pig.TestParquetHCatStorer.testStoreFuncAllSimpleTypes
(batchId=206)
org.apache.hive.hcatalog.pig.TestParquetHCatStorer.testWriteChar (batchId=206)
org.apache.hive.hcatalog.pig.TestParquetHCatStorer.testWriteVarchar
(batchId=206)
org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testDescribeErasureCoding
(batchId=254)
org.apache.hive.jdbc.TestJdbcWithMiniHS2ErasureCoding.testExplainErasureCoding
(batchId=254)
{noformat}
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/14007/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14007/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14007/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 90 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12940982 - PreCommit-HIVE-Build
> Improve table statistics for Parquet format
> -------------------------------------------
>
> Key: HIVE-20523
> URL: https://issues.apache.org/jira/browse/HIVE-20523
> Project: Hive
> Issue Type: Improvement
> Components: Physical Optimizer
> Reporter: George Pachitariu
> Assignee: George Pachitariu
> Priority: Minor
> Attachments: HIVE-20523.1.patch, HIVE-20523.2.patch,
> HIVE-20523.3.patch, HIVE-20523.patch
>
>
> Right now, in the table basic statistics, the *raw data size* for a row with
> any data type in the Parquet format is 1. This is an underestimated value
> when columns are complex data structures, like arrays.
> Having tables with underestimated raw data size makes Hive assign less
> containers (mappers/reducers) to it, making the overall query slower.
> Heavy underestimation also makes Hive choose MapJoin instead of the
> ShuffleJoin that can fail with OOM errors.
> In this patch, I compute the columns data size better, taking into account
> complex structures. I followed the Writer implementation for the ORC format.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)