[
https://issues.apache.org/jira/browse/CALCITE-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879238#comment-16879238
]
Vladimir Sitnikov commented on CALCITE-3173:
--------------------------------------------
[~donnyzone], thanks for the analysis.
As you might see, I've been there multiple times as well.
TL;DR is we want to keep the implementation simple to a some definition of
"simple".
I am not sure "Visitor" approach would help there. It is not I forbid to change
the implementation to "Vistors", however I just don't think it would make
things simpler.
For instance, I've tried Visitors in linq4j, and it is really complicated.
It might be Tagless Final interpreter would help there, however Java does not
suit well for that approach yet (e.g. see
https://medium.com/@johnmcclean/powerful-extensible-code-with-tagless-final-in-java-4094f923cdea
)
The other approach is to use "proper" code optimization framework. I don't
know. Build SSA or whatever.
I guess the key problem of BlockBuilder (or the approach it is used) is we
don't really track data flow and control flow. That is at certain points we
might want to propagate certain knowledge (e.g "we are inside if(inp4!=null) so
we can just assume inp4 is not null"), however we don't yet keep track of the
control flow.
As you show in the example, "optimized" code becomes "magically" valid even
though "unoptimized" is invalid.
However we might even prefer to remove (or restrict) certain optimizations
rather than rewrite the generator.
> RexNode Code Generation Problem
> -------------------------------
>
> Key: CALCITE-3173
> URL: https://issues.apache.org/jira/browse/CALCITE-3173
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: next
> Reporter: Feng Zhu
> Priority: Major
> Attachments: code.png
>
>
> *Abstract:* Both RexImpTable and BlockBuilder have codegen issues, but it is
> interesting that they can work together well for most cases.
> We can illustrate the problem with a simple test case in JdbcTest, in
> which the "commission" column is nullable.
> {code:java}
> @Test public void testNullSafeCheck() {
> CalciteAssert.hr()
> .query("select \"commission\" + 10 as s from \"hr\".\"emps\"")
> .returns("S=1010\n"
> + "S=510\n"
> + "S=null\n"
> + "S=260\n");
> }
> {code}
> This test case can pass as the BlockBuilder is in default optimization
> mode. However, when we set it into un-optimization mode in _EnumerableCalc_,
> this test will fail due to NPE. The following picture demonstrates their
> differences.
> !code.png!
> *1.RexImpTable generates unsafe code*
> Before translating the RexCall (_*Add*_), RexImpTable firstly translate
> its operands with (nullAs=*IS_NULL*) [1] to detect whether it is null (i.e.,
> {color:#ff0000}_inp4_unboxed_{color}). Then RexImpTable sets this operand's
> nullable in RexToLixTranslator as {color:#FF0000}false{color} [2]. After
> that, the operand will be translated again with *NOT_POSSIBLE* [3] to get the
> value (i.e., {color:#ff0000}_inp4_0_unboxed_{color}). In the end, the RexCall
> is implemented by NotNullImplementor.However, it is not safe to conduct
> operations like unboxing in the second translation phase.
> *2.BlockBuiler optimization's semantic issue buries NPE*
> BlockBuilder.optimize() changes the code semantic in this case. For
> conditional-like clauses (if...else, ?:, etc), InlineVariableVisitor will
> wrongly make variables inlined.
> In general, they can work together for most cases. However, when some
> special branch is triggered by query, the problem will be exposed. For
> example, the NewExpression (_new java.math.BigDecimal_) in CALCITE-3143
> breaks the inline optimization phase.
>
> *How to fix?*
> I have digged into this problem a couple of days and tried many
> approaches to fix it. But in this progress, I found the limitation in current
> implementation. The whole recursive framework essentially conducts a
> sequential codegen beheavior, and may visit a RexNode again and again with
> different NullAs settings.
> Due to the limitation, it is difficult to implement null-safe codegen
> semantics with branching logic. We can also find that there are many branches
> for special cases in current implementation. Even we can handle potential
> issues every time, the logic will becomes more and more complex and
> unfriendly for maintenance.
>
> Therefore, I propose to re-consider this part, including several initial
> points.
> (1) {color:#ff0000}_Visitor Pattern_{color} (RexVisitor<Result>).
> Theoretically, RexNode can be translated into Expression by visiting the node
> only once. We can implement RexVisitor rather than current recursive
> translation.
> (2)The {color:#ff0000}Result{color} consists of three items (code:
> BuilderStatement, isNull: ParameterExpression, value: Expression).So it is
> easy to decide how to implement a RexNode according to its children.
>
> Please correct me if I make something wrong. Look forward to suggestions!
>
> [1][https://github.com/apache/calcite/blob/1748f0503e7b626a8d0165f1698adb8b61bbc31e/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1062]
> [2][https://github.com/apache/calcite/blob/1748f0503e7b626a8d0165f1698adb8b61bbc31e/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1064]
> [3][https://github.com/apache/calcite/blob/1748f0503e7b626a8d0165f1698adb8b61bbc31e/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1113]
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)