[GitHub] [beam] mosche commented on issue #24314: [Bug]: Poor SQL -> PTransform translation in SqlTransform

GitBox Fri, 02 Dec 2022 00:21:23 -0800


mosche commented on issue #24314:
URL: https://github.com/apache/beam/issues/24314#issuecomment-1334895553

@apilloud I've already checked / compared `SQLPlan` and `BEAMPlan`. In the
logs both plans look fine, the subtile difference can only be observed if
checking the instances / ids of `RelNodes` in the plan.

If a query contains a subquery that is repeatedly used (either by
referencing a named subquery or just by repeating the same query string), this
is reduced to a single instance RelNode that is reused throughout `SQLPlan`.

The problem is that the `CheapestPlanReplacer ` copies all `RelNodes` after
applying a rule, in this case the `BeamRelNode` related rules. After applying
Beam's rules, `BEAMPlan ` has potentially lost a lot of semantic information
about the query: knowledge of repeatedly used subqueries. This is rather bad
because it's not possible for runners to optimize for that usage pattern
anymore.

This is the
[hack](https://github.com/mosche/beam/commit/874fa34e7ed8567678443775adf520d326a30243)
i've used to generate the transform hierarchies I've visualized above:
- `denormalized`: The situation as-is when `BeamRelNodes` are copied leading
to independent `PCollection` instances.
- `normalized`: A "hacked" copy that returns a cached `BeamRelNode` instance
if previously copied with the same parameters. This way the semantic
information is kept in the plan. And when `SqlTransform` is expanded we end up
reusing such `PCollections` as these are cached by `RelNode#getId` (see
[here](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSqlRelUtils.java#L91-L110))

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] mosche commented on issue #24314: [Bug]: Poor SQL -> PTransform translation in SqlTransform

Reply via email to