Riza Suminto created IMPALA-11972:
-------------------------------------
Summary: Factor in row width during ProcessingCost calculation.
Key: IMPALA-11972
URL: https://issues.apache.org/jira/browse/IMPALA-11972
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 4.3.0
Reporter: Riza Suminto
Assignee: Riza Suminto
IMPALA-11604 add ProcessingCost (PC) concept to measure the cost for a distinct
PlanNode / DataSink / PlanFragment to process its input rows globally across
all of its instances.
We should investigate if the row width should be considered in computing PC for
more operators, and if that will make the PC model more accurate. The code in
IMPALA-11604 has materialization cost parameter to accommodate PC where row
width should factor in. Currently, PC of ScanNode, ExchangeNode, and
DataStreamSink has row width factored in through materialization parameter here.
For VARCHAR, we can use some kind of average width stats, if available. For
fixed width columns, we just use the width. In both cases, the unit should be
in bytes. The idea of including a width in costing is to make the outcome as
precise and less error-prone as possible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]