[ 
https://issues.apache.org/jira/browse/IMPALA-10696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345719#comment-17345719
 ] 

ASF subversion and git services commented on IMPALA-10696:
----------------------------------------------------------

Commit bc94f3ad57837ee31e7bde528d1edad944d56940 in impala's branch 
refs/heads/branch-4.0.0 from liuyao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bc94f3a ]

IMPALA-10696: fix accuracy problem

Table alltypes has no statistics, so the cardinality of alltypes
will be estimated based on the hdfs files and the avg row size.
Calling PrintUtils.printMetric, double will be divided by long. There
will be accuracy problems. In most cases, the number of lines
calculated is 17.91 K. But due to accuracy problems here, the
calculated value is 17.90K.

I modified line 221 of stats-extrapolation.test and used row_regex
to match, referring to the matching method of cardinality in line
224,in this case, their values are the same

Testing:
metadata/test_stats_extrapolation.py

Change-Id: I0a1a3809508c90217517705b2b188b2ccba6f23f
Reviewed-on: http://gerrit.cloudera.org:8080/17411
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Jim Apple <[email protected]>


> Minor size differences breaks 
> metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10696
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10696
>             Project: IMPALA
>          Issue Type: Task
>          Components: Frontend
>    Affects Versions: Impala 4.0
>         Environment: Ubuntu 16.04, jenkins.impala.io
>            Reporter: Jim Apple
>            Assignee: liuyao
>            Priority: Blocker
>
> One test is breaking in the 4.0.0 RC2, hence I marked this as blocker. 
> [~liuyao] , I picked your name as the assignee since I thought you might be 
> knowledgeable about this part of the codebase. Here's the test output:
> {noformat}
> E   assert Items in expected results not found in actual results:
> E     '     partitions: 0/24 rows=17.91K'
> E     Items in actual results:
> E     'Per-Host Resource Estimates: Memory=20MB'
> E     '|  output exprs: id'
> E     ''
> E     '   HDFS partitions=24/24 files=36 size=281.43KB'
> E     '     table: rows=unavailable size=unavailable'
> E     '   stored statistics:'
> E     '|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0'
> E     '     columns: unavailable'
> E     '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]'
> E     '   tuple-ids=0 row-size=4B cardinality=17.90K'
> E     '|'
> E     'Max Per-Host Resource Reservation: Memory=4.01MB Threads=2'
> E     'Analyzed query: SELECT id FROM 
> test_stats_extrapolation_5c6bdfd.alltypes'
> E     'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
> E     '     partitions: 0/24 rows=17.90K'
> E     'test_stats_extrapolation_5c6bdfd.alltypes'
> E     'PLAN-ROOT SINK'
> E     '   in pipelines: 00(GETNEXT)'
> E     '   extrapolated-rows=unavailable max-scan-range-rows=unavailable'
> E     'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E     '|  Per-Host Resources: mem-estimate=20.00MB mem-reservation=4.01MB 
> thread-reservation=2'
> E     '   mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1'
> {noformat}
>  
>  [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13812/consoleText]
>  
> CC [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to