Hi everybody, At the moment, Catalyst rules are defined using two different types of rules: `Rule[LogicalPlan]` and `Strategy` (which in turn maps to `GenericStrategy[SparkPlan]`).
I propose to introduce utility methods to a) reduce the boilerplate to define rewrite rules b) turning them back into what they essentially represent: function types. These changes would be backwards compatible, and would greatly help in understanding what the code does. Personally, I feel like the current use of objects is redundant and possibly confusing. ## `Rule[LogicalPlan]` The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides defining a default `val ruleName` only defines the method `apply(plan: TreeType): TreeType`. Because the body of such method is always supposed to read `plan match pf`, with `pf` being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can conclude that `Rule[LogicalPlan]` might be substituted by a PartialFunction. I propose the following: a) Introduce the utility method def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] = new Rule[LogicalPlan] { def apply (plan: LogicalPlan): LogicalPlan = plan transform pf } b) progressively replace the boilerplate-y object definitions; e.g. object MyRewriteRule extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case ... => ... } with // define a Rule[LogicalPlan] val MyRewriteRule = rule { case ... => ... } it might also be possible to make rule method `implicit`, thereby further reducing MyRewriteRule to: // define a PartialFunction[LogicalPlan, LogicalPlan] // the implicit would convert it into a Rule[LogicalPlan] at the use sites val MyRewriteRule = { case ... => ... } ## Strategies A similar solution could be applied to shorten the code for Strategies, which are total functions only because they are all supposed to manage the default case, possibly returning `Nil`. In this case we might introduce the following utility methods: /** * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. * The partial function must therefore return *one single* SparkPlan for each case. * The method will automatically wrap them in a [[Seq]]. * Unhandled cases will automatically return Seq.empty */ protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty } /** * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ]. * The partial function must therefore return a Seq[SparkPlan] for each case. * Unhandled cases will automatically return Seq.empty */ protected def seqrule(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] } Thanks in advance e.v. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org