[ 
https://issues.apache.org/jira/browse/SPARK-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368649#comment-14368649
 ] 

Liang-Chi Hsieh edited comment on SPARK-6354 at 3/20/15 4:59 PM:
-----------------------------------------------------------------

h2. Introduction

Currently we use the cached data in SparkSQL by looking for fully the same 
logical plan. The logic is implemented in {{CacheManager.useCachedData}}. If we 
find a cached version of the logical plan, we can replace it with the cached 
version.

This ticker expands the approach and looks for the the logical plan that 
contains all output of the given logical plan. If we find a cached plan 
satisfying this condition, we can replace it with the cached version.

h2. Current approach

The comparison logic is in {{LogicalPlan.sameResult}}. To have two logical 
plans considered the same one, it should satisfy few conditions:

# They are the same class
# Their children sizes are the same
# Their cleanArgs are the same
# All their children satisfying the conditions above

h2. Proposed approach

This ticker wants to expand the current approach. The expanded approach uses 
the cached data by looking for the logical plan that is superset of current 
logical plan. In other words, the current logical plan will return part of the 
results of the cached plan.

The comparison logic is in {{LogicalPlan.partResult}}. It has a parameter 
{{plan: LogicalPlan}}. To have the given {{plan}} considered the part of 
another logical plan (called {{this plan}} below), it should also satisfy few 
conditions:

# They are the same class
# Their children sizes are the same
# The cleanArgs of given {{plan}} are contained in {{this plan}}. For each 
element {{e}} at the index {{i}} in the cleanArgs of given {{plan}}:
## If {{e}} is the type of {{Seq[Expression]}}, we check if the element at the 
index {{i}} in the {{cleanArgs}} of {{this plan}} is also a {{Seq[Expression]}} 
and contains all elements in {{e}}'s {{cleanArgs}}
## Otherwise, we check if {{e}} is the same as the element at the index {{i}} 
in the {{cleanArgs}} of {{this plan}}.
# All their children satisfying the conditions above



was (Author: viirya):
h2. Introduction

Currently we use the cached data in SparkSQL by looking for fully the same 
logical plan. The logic is implemented in {{CacheManager.useCachedData}}. If we 
find a cached version of the logical plan, we can replace it with the cached 
version.

This ticker expands the approach and looks for the the logical plan that 
contains all output of the given logical plan. If we find a cached plan 
satisfying this condition, we can replace it with the cached version.

h2. Current approach

The comparison logic is in {{LogicalPlan.sameResult}}. To have two logical 
plans considered the same one, it should satisfy few conditions:

# They are the same class
# Their children sizes are the same
# Their cleanArgs are the same
# All their children satisfying the conditions above

h2. Proposed approach

This ticker wants to expand the current approach. The expanded approach uses 
the cached data by looking for the logical plan that is superset of current 
logical plan. In other words, the current logical plan will return part of the 
results of the cached plan.

The comparison logic is in {{LogicalPlan.partResult}}. It has a parameter 
{{plan: LogicalPlan}}. To have the given {{plan}} considered the part of 
another logical plan (called {{this plan}} below), it should also satisfy few 
conditions:

# They are the same class
# Their children sizes are the same
# The cleanArgs of given {{plan}} are contained in {{this plan}}. For each 
element {{e}} at the index {{i}} in the cleanArgs of given {{plan}}:
## If {{e}} is the type of {{Seq[Expression]}}, we check if the 
{{cleanArgs(i)}} of {{this plan}} is also a {{Seq[Expression]}} and contains 
all elements in {{e}}'s {{cleanArgs(i)}}
## Otherwise, we check if {{e}} is the same as {{cleanArgs(i)}} of {{this 
plan}}.
# All their children satisfying the conditions above


> Replace the plan which is part of cached query
> ----------------------------------------------
>
>                 Key: SPARK-6354
>                 URL: https://issues.apache.org/jira/browse/SPARK-6354
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>            Priority: Minor
>
> Currently we only replace the plan which equals to cached query. This 
> approach can be extended to replace the plan which is part of cached query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to