[ 
https://issues.apache.org/jira/browse/IMPALA-11829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677231#comment-17677231
 ] 

ASF subversion and git services commented on IMPALA-11829:
----------------------------------------------------------

Commit 92265e6f81572fef09fbc2cf51174611dc1d788a in impala's branch 
refs/heads/master from David Rorke
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=92265e6f8 ]

IMPALA-11829 - Fix bug in cardinality estimates related to TABLE_NUM_ROWS hint

IMPALA-7942 added support for a TABLE_NUM_ROWS query hint which can be used
to specify a table cardinality for cases where stats are missing or invalid.
In the case where stats were missing or invalid and no TABLE_NUM_ROWS hint was 
specified
by the user, HdfsScanNode.getStatsNumRows was incorrectly returning a default 
value of -1
instead of returning a rough estimate of cardinality as it had prior to the 
IMPALA-7942 change.
This change fixes the return value of getStatsNumRows so it only uses the 
TABLE_NUM_ROWS
value when the users has actually specified the query hint.

Change-Id: Ia27745fd93abd5dec99bf82f16899bd15a2b88ae
Reviewed-on: http://gerrit.cloudera.org:8080/19421
Reviewed-by: Qifan Chen <[email protected]>
Reviewed-by: wangsheng <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Flaky TestCorruptTableStats.test_corrupt_stats
> ----------------------------------------------
>
>                 Key: IMPALA-11829
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11829
>             Project: IMPALA
>          Issue Type: Test
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Tamas Mate
>            Assignee: David Rorke
>            Priority: Major
>
> TestCorruptTableStats.test_corrupt_stats is failing frequently with the 
> following stack trace.
> {code:none}
> metadata/test_compute_stats.py:369: in test_corrupt_stats
>     self.run_test_case('QueryTest/corrupt-stats', vector, unique_database)
> common/impala_test_suite.py:743: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:579: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:246: in verify_query_result_is_subset
>     assert expected_literal_strings <= actual_literal_strings
> E   assert Items in expected results not found in actual results:
> E     '   row-size=0B cardinality=1'
> E     Items in actual results:
> E     '|  row-size=8B cardinality=1'
> E     'the partition(s) is positive.'
> E     '   partition predicates: org = 1'
> E     'PLAN-ROOT SINK'
> E     ''
> E     'statistics when the corresponding tables are transactional.'
> E     '|  output: count(*)'
> E     'is either a) less than -1, or b) 0 but the size of all the files 
> inside '
> E     '00:SCAN HDFS [test_corrupt_stats_4d6cb186.corrupted]'
> E     '03:AGGREGATE [FINALIZE]'
> E     '   HDFS partitions=1/2 files=1 size=24B'
> E     '01:AGGREGATE'
> E     'If it is suspected that there may be corrupt statistics, dropping and '
> E     'The latter case does not necessarily imply the existence of corrupt '
> E     'Max Per-Host Resource Reservation: Memory=8.00KB Threads=3'
> E     '   row-size=0B cardinality=unavailable'
> E     'test_corrupt_stats_4d6cb186.corrupted'
> E     'The row count in one or more partitions in the following tables '
> E     '|  output: count:merge(*)'
> E     'Per-Host Resource Estimates: Memory=32MB'
> E     '02:EXCHANGE [UNPARTITIONED]'
> E     're-computing statistics could resolve this problem.'
> E     'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E     '|'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to