[ 
https://issues.apache.org/jira/browse/DRILL-7570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7570:
------------------------------------
    Labels: ready-to-commit  (was: )

> Fix unstable statistics tests
> -----------------------------
>
>                 Key: DRILL-7570
>                 URL: https://issues.apache.org/jira/browse/DRILL-7570
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.17.0
>            Reporter: Vova Vysotskyi
>            Assignee: Vova Vysotskyi
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.18.0
>
>
> Drill contains tests for checking that statistics is applied, some of them 
> also use sampling to calculate statistics value.
> Sampling adds limit above scan, but tests check the value of the estimated 
> row count to verify that statistics were applied. limit without sorting 
> doesn't guarantee consistent results, so these tests may fail sometime:
> {noformat}
> [ERROR]   TestMetastoreCommands.testAnalyzeWithSampleStatistics:2739 Did not 
> find expected pattern in plan: Filter\(condition.*\).*rowcount = 96.25,
> 00-00    Screen : rowType = RecordType(ANY employee_id): rowcount = 105.0, 
> cumulative cost = {2530.5 rows, 7570.5 cpu, 2310.0 io, 0.0 network, 0.0 
> memory}, id = 336738
> 00-01      Project(employee_id=[$1]) : rowType = RecordType(ANY employee_id): 
> rowcount = 105.0, cumulative cost = {2520.0 rows, 7560.0 cpu, 2310.0 io, 0.0 
> network, 0.0 memory}, id = 336737
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY department_id, 
> ANY employee_id): rowcount = 105.0, cumulative cost = {2415.0 rows, 7455.0 
> cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336736
> 00-03          Filter(condition=[=($0, 2)]) : rowType = RecordType(ANY 
> department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2310.0 
> rows, 7350.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336735
> 00-04            Scan(table=[[dfs, tmp, employeeWithStatsFile]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile/0_0_0.parquet]],
>  
> selectionRoot=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile,
>  numFiles=1, numRowGroups=1, usedMetadataFile=false, usedMetastore=true, 
> filter=equal(`department_id`, 2) , columns=[`department_id`, 
> `employee_id`]]]) : rowType = RecordType(ANY department_id, ANY employee_id): 
> rowcount = 1155.0, cumulative cost = {1155.0 rows, 2310.0 cpu, 2310.0 io, 0.0 
> network, 0.0 memory}, id = 336734
>  expected:<true> but was:<false> {noformat}
> List of tests to fix:
>  - TestMetastoreCommands.testAnalyzeWithSampleStatistics;
>  - TestAnalyze.basic3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to