I wish to write a custom logical plan rule that modifies the output schema and 
grows the logical plan. The purpose of the rule is roughly to apply a 
projection on top of DatasourceV2Relation depending on some condition:

case class MyRule extends Rule[LogicalPlan] {
  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case relation: DataSourceV2Relation if someCondition(relation) =>
      Project(getExpressions(relation), relation)

We add this rule to extendedResolutionRules understanding it can’t be a 
postHocResolutionRule or optimizer rule because it modifies the output scheme 
and the Project needs to be resolved.

As extendedResolutionRule it runs in the fixed-point “Resolution” batch, 
meaning the batch keeps running indefinitely as the rule grows the query plan 
on every iteration.

It is possible to avoid the batch running indefinitely by adding to the 
relation options or node tags to mark the node was processed. This feels a 
little hacky. Is there functionality in Spark I am missing that can achieve the 
desired behavior without resorting to this?

I imagine there may be a rule in spark that deals with this but I could not 
find it.

If this is not covered, I can draft a contribution to cover this case.


Reply via email to