[
https://issues.apache.org/jira/browse/IMPALA-11829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677231#comment-17677231
]
ASF subversion and git services commented on IMPALA-11829:
----------------------------------------------------------
Commit 92265e6f81572fef09fbc2cf51174611dc1d788a in impala's branch
refs/heads/master from David Rorke
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=92265e6f8 ]
IMPALA-11829 - Fix bug in cardinality estimates related to TABLE_NUM_ROWS hint
IMPALA-7942 added support for a TABLE_NUM_ROWS query hint which can be used
to specify a table cardinality for cases where stats are missing or invalid.
In the case where stats were missing or invalid and no TABLE_NUM_ROWS hint was
specified
by the user, HdfsScanNode.getStatsNumRows was incorrectly returning a default
value of -1
instead of returning a rough estimate of cardinality as it had prior to the
IMPALA-7942 change.
This change fixes the return value of getStatsNumRows so it only uses the
TABLE_NUM_ROWS
value when the users has actually specified the query hint.
Change-Id: Ia27745fd93abd5dec99bf82f16899bd15a2b88ae
Reviewed-on: http://gerrit.cloudera.org:8080/19421
Reviewed-by: Qifan Chen <[email protected]>
Reviewed-by: wangsheng <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Flaky TestCorruptTableStats.test_corrupt_stats
> ----------------------------------------------
>
> Key: IMPALA-11829
> URL: https://issues.apache.org/jira/browse/IMPALA-11829
> Project: IMPALA
> Issue Type: Test
> Components: Frontend
> Affects Versions: Impala 4.3.0
> Reporter: Tamas Mate
> Assignee: David Rorke
> Priority: Major
>
> TestCorruptTableStats.test_corrupt_stats is failing frequently with the
> following stack trace.
> {code:none}
> metadata/test_compute_stats.py:369: in test_corrupt_stats
> self.run_test_case('QueryTest/corrupt-stats', vector, unique_database)
> common/impala_test_suite.py:743: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:579: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:246: in verify_query_result_is_subset
> assert expected_literal_strings <= actual_literal_strings
> E assert Items in expected results not found in actual results:
> E ' row-size=0B cardinality=1'
> E Items in actual results:
> E '| row-size=8B cardinality=1'
> E 'the partition(s) is positive.'
> E ' partition predicates: org = 1'
> E 'PLAN-ROOT SINK'
> E ''
> E 'statistics when the corresponding tables are transactional.'
> E '| output: count(*)'
> E 'is either a) less than -1, or b) 0 but the size of all the files
> inside '
> E '00:SCAN HDFS [test_corrupt_stats_4d6cb186.corrupted]'
> E '03:AGGREGATE [FINALIZE]'
> E ' HDFS partitions=1/2 files=1 size=24B'
> E '01:AGGREGATE'
> E 'If it is suspected that there may be corrupt statistics, dropping and '
> E 'The latter case does not necessarily imply the existence of corrupt '
> E 'Max Per-Host Resource Reservation: Memory=8.00KB Threads=3'
> E ' row-size=0B cardinality=unavailable'
> E 'test_corrupt_stats_4d6cb186.corrupted'
> E 'The row count in one or more partitions in the following tables '
> E '| output: count:merge(*)'
> E 'Per-Host Resource Estimates: Memory=32MB'
> E '02:EXCHANGE [UNPARTITIONED]'
> E 're-computing statistics could resolve this problem.'
> E 'WARNING: The following tables are missing relevant table and/or column
> statistics.'
> E '|'
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]