Thanks for the prompt feedback; I have further expanded on your suggestions on this JIRA https://issues.apache.org/jira/browse/SPARK-7754
On Tue, May 19, 2015 at 8:35 PM, Michael Armbrust <mich...@databricks.com> wrote: > Overall this seems like a reasonable proposal to me. Here are a few > thoughts: > > - There is some debugging utility to the ruleName, so we would probably > want to at least make that an argument to the rule function. > - We also have had rules that operate on SparkPlan, though since there is > only one ATM maybe we don't need sugar there. > - I would not call the sugar for creating Strategies rule/seqrule, as I > think the one-to-one vs one-to-many distinction is useful. > - I'm generally pro-refactoring to make the code nicer, especially when its > not official public API, but I do think its important to maintain source > compatibility (which I think you are) when possible as there are other > projects using catalyst. > - Finally, we'll have to balance this with other code changes / conflicts. > > You should probably open a JIRA and we can continue the discussion there. > > On Tue, May 19, 2015 at 4:16 AM, Edoardo Vacchi <uncommonnonse...@gmail.com> > wrote: >> >> Hi everybody, >> >> At the moment, Catalyst rules are defined using two different types of >> rules: >> `Rule[LogicalPlan]` and `Strategy` (which in turn maps to >> `GenericStrategy[SparkPlan]`). >> >> I propose to introduce utility methods to >> >> a) reduce the boilerplate to define rewrite rules >> b) turning them back into what they essentially represent: function >> types. >> >> These changes would be backwards compatible, and would greatly help in >> understanding what the code does. Personally, I feel like the current >> use of objects is redundant and possibly confusing. >> >> ## `Rule[LogicalPlan]` >> >> The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides >> defining a default `val ruleName` >> only defines the method `apply(plan: TreeType): TreeType`. >> Because the body of such method is always supposed to read `plan match >> pf`, with `pf` >> being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can >> conclude that `Rule[LogicalPlan]` >> might be substituted by a PartialFunction. >> >> I propose the following: >> >> a) Introduce the utility method >> >> def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): >> Rule[LogicalPlan] = >> new Rule[LogicalPlan] { >> def apply (plan: LogicalPlan): LogicalPlan = plan transform pf >> } >> >> b) progressively replace the boilerplate-y object definitions; e.g. >> >> object MyRewriteRule extends Rule[LogicalPlan] { >> def apply(plan: LogicalPlan): LogicalPlan = plan transform { >> case ... => ... >> } >> >> with >> >> // define a Rule[LogicalPlan] >> val MyRewriteRule = rule { >> case ... => ... >> } >> >> it might also be possible to make rule method `implicit`, thereby >> further reducing MyRewriteRule to: >> >> // define a PartialFunction[LogicalPlan, LogicalPlan] >> // the implicit would convert it into a Rule[LogicalPlan] at the use >> sites >> val MyRewriteRule = { >> case ... => ... >> } >> >> >> ## Strategies >> >> A similar solution could be applied to shorten the code for >> Strategies, which are total functions >> only because they are all supposed to manage the default case, >> possibly returning `Nil`. In this case >> we might introduce the following utility methods: >> >> /** >> * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. >> * The partial function must therefore return *one single* SparkPlan >> for each case. >> * The method will automatically wrap them in a [[Seq]]. >> * Unhandled cases will automatically return Seq.empty >> */ >> protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy >> = >> new Strategy { >> def apply(plan: LogicalPlan): Seq[SparkPlan] = >> if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty >> } >> >> /** >> * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] >> ]. >> * The partial function must therefore return a Seq[SparkPlan] for each >> case. >> * Unhandled cases will automatically return Seq.empty >> */ >> protected def seqrule(pf: PartialFunction[LogicalPlan, >> Seq[SparkPlan]]): Strategy = >> new Strategy { >> def apply(plan: LogicalPlan): Seq[SparkPlan] = >> if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] >> } >> >> Thanks in advance >> e.v. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org