[ 
https://issues.apache.org/jira/browse/IMPALA-13437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021270#comment-18021270
 ] 

ASF subversion and git services commented on IMPALA-13437:
----------------------------------------------------------

Commit 3181fe18006e392e0ce3f2f48fe285569ccfd148 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3181fe180 ]

IMPALA-13437 (part 1): Compute processing cost before TupleCachePlanner

This is a preparatory change for cost-based placement for
TupleCacheNodes. It reorders planning so that the processing cost and
filtered cardinality are calculated before running the TupleCachePlanner.
This computes the processing cost when enable_tuple_cache=true.
It also displays the cost information in the explain plan output
when enable_tuple_cache=true. This does not impact the adjustment
of fragment parallelism, which continues to be controlled by the
compute_processing_cost option.

This uses the processing cost to calculate a cumulative processing
cost in the TupleCacheInfo. This is all of the processing cost below
this point including other fragments. This is an indicator of how
much processing a cache hit could avoid. This does not accumulate the
cost when merging the TupleCacheInfo due to a runtime filter, as that
cost is not actually being avoided. This also computes the estimated
serialized size for the TupleCacheNode based on the filtered
cardinality and the row size.

Testing:
 - Ran a core job

Change-Id: If78f5d002b0e079eef1eece612f0d4fefde545c7
Reviewed-on: http://gerrit.cloudera.org:8080/23164
Reviewed-by: Yida Wu <wydbaggio...@gmail.com>
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Tested-by: Michael Smith <michael.sm...@cloudera.com>


> Improve heuristics for placing the tuple cache nodes
> ----------------------------------------------------
>
>                 Key: IMPALA-13437
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13437
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Improve placement of tuple cache nodes by considering:
>  # Selectivity
>  # Result Size
>  # Operator cost
>  # Data change frequency (maybe followup)
>  # Etc
> This should avoid caching large results that don't have a major performance 
> improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to