[ 
https://issues.apache.org/jira/browse/IMPALA-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302757#comment-17302757
 ] 

Aman Sinha commented on IMPALA-10588:
-------------------------------------

Seems related to IMPALA-10503 although I don't know why the number of files is 
different even though the cardinality in both scans is 6M. [~kdeschle] would 
you able to look in this ?

> PlannerTest/resource-requirements.test fails with bad mem estimates (from 
> number of files?)
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10588
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10588
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0
>            Reporter: Andrew Sherman
>            Assignee: Kurt Deschler
>            Priority: Critical
>
> We see an unexpected plan in the plan for "select * from 
> tpch_orc_def.lineitem" with Hive v3.
> The first line to diff is 
> {code}
> Per-Host Resource Estimates: Memory=188MB 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> {code}
> but in the scan we see
> {code}
> HDFS partitions=1/1 files=1 size=142.84MB
> {code}
> instead of the expected 
> {code}
> HDFS partitions=1/1 files=5 size=142.72MB
> {code}
> Could this be a regression from the recent change IMPALA-10503 which changed 
> data loading?
> {code}
> Section PLAN of query:
> select * from tpch_orc_def.lineitem
> Actual does not match expected result:
> Max Per-Host Resource Reservation: Memory=12.00MB Threads=2
> Per-Host Resource Estimates: Memory=188MB
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Analyzed query: SELECT * FROM tpch_orc_def.lineitem
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=188.00MB mem-reservation=12.00MB 
> thread-reservation=2
> PLAN-ROOT SINK
> |  output exprs: tpch_orc_def.lineitem.l_orderkey, 
> tpch_orc_def.lineitem.l_partkey, tpch_orc_def.lineitem.l_suppkey, 
> tpch_orc_def.lineitem.l_linenumber, tpch_orc_def.lineitem.l_quantity, 
> tpch_orc_def.lineitem.l_extendedprice, tpch_orc_def.lineitem.l_discount, 
> tpch_orc_def.lineitem.l_tax, tpch_orc_def.lineitem.l_returnflag, 
> tpch_orc_def.lineitem.l_linestatus, tpch_orc_def.lineitem.l_shipdate, 
> tpch_orc_def.lineitem.l_commitdate, tpch_orc_def.lineitem.l_receiptdate, 
> tpch_orc_def.lineitem.l_shipinstruct, tpch_orc_def.lineitem.l_shipmode, 
> tpch_orc_def.lineitem.l_comment
> |  mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |
> 00:SCAN HDFS [tpch_orc_def.lineitem]
>    HDFS partitions=1/1 files=1 size=142.84MB
>    stored statistics:
>      table: rows=6.00M size=142.84MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=6.00M
>    mem-estimate=88.00MB mem-reservation=8.00MB thread-reservation=1
>    tuple-ids=0 row-size=231B cardinality=6.00M
>    in pipelines: 00(GETNEXT)
> Expected:
> Max Per-Host Resource Reservation: Memory=12.00MB Threads=2
> Per-Host Resource Estimates: Memory=140MB
> Analyzed query: SELECT * FROM tpch_orc_def.lineitem
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=140.00MB mem-reservation=12.00MB 
> thread-reservation=2
> PLAN-ROOT SINK
> |  output exprs: tpch_orc_def.lineitem.l_orderkey, 
> tpch_orc_def.lineitem.l_partkey, tpch_orc_def.lineitem.l_suppkey, 
> tpch_orc_def.lineitem.l_linenumber, tpch_orc_def.lineitem.l_quantity, 
> tpch_orc_def.lineitem.l_extendedprice, tpch_orc_def.lineitem.l_discount, 
> tpch_orc_def.lineitem.l_tax, tpch_orc_def.lineitem.l_returnflag, 
> tpch_orc_def.lineitem.l_linestatus, tpch_orc_def.lineitem.l_shipdate, 
> tpch_orc_def.lineitem.l_commitdate, tpch_orc_def.lineitem.l_receiptdate, 
> tpch_orc_def.lineitem.l_shipinstruct, tpch_orc_def.lineitem.l_shipmode, 
> tpch_orc_def.lineitem.l_comment
> |  mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |
> 00:SCAN HDFS [tpch_orc_def.lineitem]
>    HDFS partitions=1/1 files=5 size=142.72MB
>    stored statistics:
>      table: rows=6.00M size=142.72MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=1.73M
>    mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=1
>    tuple-ids=0 row-size=231B cardinality=6.00M
>    in pipelines: 00(GETNEXT)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to