[ 
https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553849#comment-14553849
 ] 

Edoardo Vacchi edited comment on SPARK-7754 at 5/21/15 8:24 AM:
----------------------------------------------------------------

That's a fair observation; on the other hand:

1) is the code concerning rule application expecting throws at all? 
2) RuleExecutor traces rules by their ruleName; if you really need named rules, 
you could use `rule.named("...") { }}`
   this does not solve the stack trace problem, but you'll still have 
meaningful information in the log

also, this could be partially worked around by carefully modularizing rules 
within container objects/traits


was (Author: evacchi):
That's a fair observation; on the other hand:

1) is the code concerning rule application expecting throws at all? 
2) RuleExecutor traces rules by their ruleName; if you really need named rules, 
you could use `rule.named("...") { }}`
   this does not solve the stack trace problem, but you'll still have 
meaningful information in the log


> Use PartialFunction literals instead of objects in Catalyst
> -----------------------------------------------------------
>
>                 Key: SPARK-7754
>                 URL: https://issues.apache.org/jira/browse/SPARK-7754
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Edoardo Vacchi
>
> Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and 
> {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
> The distinction is fairly subtle: in the end, both rule types are supposed to 
> define a method {{apply(plan: LogicalPlan)}}
> (where LogicalPlan is either Logical- or Spark-) which returns a transformed 
> plan (or a sequence thereof, in the case
> of Strategy).
> Ceremonies asides, the body of such method is always of the kind:
> {code:java}
>      def apply(plan: PlanType) = plan match pf
> {code}
> where `pf` would be some `PartialFunction` of the PlanType:
> {code:java}
>       val pf = {
>         case ... => ...
>       }
> {code}
> This is JIRA is a proposal to introduce utility methods to
>   a) reduce the boilerplate to define rewrite rules
>   b) turning them back into what they essentially represent: function types.
> These changes would be backwards compatible, and would greatly help in 
> understanding what the code does. Current use of objects is redundant and 
> possibly confusing.
> *{{Rule[LogicalPlan]}}*
> a) Introduce the utility object
> {code:java}
>   object rule 
>     def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
> Rule[LogicalPlan] =
>       new Rule[LogicalPlan] {
>         def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>       }
>     def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
> Rule[LogicalPlan] =
>       new Rule[LogicalPlan] {
>         override val ruleName = name
>         def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>       }
> {code}
> b) progressively replace the boilerplate-y object definitions; e.g.
> {code:java}
>     object MyRewriteRule extends Rule[LogicalPlan] {
>       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
>         case ... => ...
>     }
> {code}
> with
> {code:java}
>     // define a Rule[LogicalPlan]
>     val MyRewriteRule = rule {
>       case ... => ...
>     }
> {code}
> and/or :
> {code:java}
>     // define a named Rule[LogicalPlan]
>     val MyRewriteRule = rule.named("My rewrite rule") {
>       case ... => ...
>     }
> {code}
> *Strategies*
> A similar solution could be applied to shorten the code for
> Strategies, which are total functions
> only because they are all supposed to manage the default case,
> possibly returning `Nil`. In this case
> we might introduce the following utility:
> {code:java}
> object strategy {
>   /**
>    * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
>    * The partial function must therefore return *one single* SparkPlan for 
> each case.
>    * The method will automatically wrap them in a [[Seq]].
>    * Unhandled cases will automatically return Seq.empty
>    */
>   def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
>     new Strategy {
>       def apply(plan: LogicalPlan): Seq[SparkPlan] =
>         if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
>     }
>   /**
>    * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] 
> ].
>    * The partial function must therefore return a Seq[SparkPlan] for each 
> case.
>    * Unhandled cases will automatically return Seq.empty
>    */
>  def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
>     new Strategy {
>       def apply(plan: LogicalPlan): Seq[SparkPlan] =
>         if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
>     }
> }
> {code}
> Usage:
> {code:java}
> val mystrategy = strategy { case ... => ... }
> val seqstrategy = strategy.seq { case ... => ... }
> {code}
> *Further possible improvements:*
> Making the utility methods `implicit`, thereby
> further reducing the rewrite rules to:
> {code:java}
>     // define a PartialFunction[LogicalPlan, LogicalPlan]
>     // the implicit would convert it into a Rule[LogicalPlan] at the use sites
>     val MyRewriteRule = {
>       case ... => ...
>     }
> {code}
> *Caveats*
> Because of the way objects are initialized vs. vals, it might be necessary
> reorder instructions so that vals are actually initialized before they are 
> used.
> E.g.:
> {code:java}
> class MyOptimizer extends Optimizer {
>   override val batches: Seq[Batch] =
>       ...
>       Batch("Other rules", FixedPoint(100),
>         MyRewriteRule // <--- might throw NPE
>   val MyRewriteRule = ...
> }
> {code}
> this is easily fixed by factoring rules out as follows:
> {code:java}
> class MyOptimizer extends Optimizer with CustomRules {
>   val MyRewriteRule = ...
>   override val batches: Seq[Batch] =
>       ...
>       Batch("Other rules", FixedPoint(100),
>         MyRewriteRule 
> }
> {code}
> or, even better, further modularizing, e.g using traits or container objects:
> {code:java}
> class MyOptimizer extends Optimizer with CustomRules {
>   override val batches: Seq[Batch] =
>       ...
>       Batch("Other rules", FixedPoint(100),
>         MyRewriteRule 
> }
> trait CustomRules {
>   val MyRewriteRule = ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to