Re: [DISCUSS] Preserving Output Alias Names After RelNode Optimization

Julian Hyde Fri, 27 Jun 2025 12:17:43 -0700

Irresistible force meets immovable object. A lot of people want nice, 
predictable names; but I and others have explained at length why it is 
fundamentally very hard to do.


It’s hard to even specify what is the right thing. The Volcano planning 
algorithm is subtle and powerful, and the corner cases crop up surprisingly 
often.

* When ProjectMergeRule merges P1 onto P2, should the result have the column 
names of P1 or P2? It depends.

* When two equivalence sets S1 and S2 are merged, should the new set have the 
field names of S1 or S2? Should I change the field names of all RelNodes in the 
new set? Should I propagate those field names to a Filter whose input is that 
set?

* When I write “select nvl(x, y) from (select x as y, x as z from t)” should 
the output column be called x or y or z?

* If I write “select e.deptno from emp, dept”, there is an intermediate join 
that has two columns called “deptno”. Should I rename one of them to 
“deptno$1”? If so, which one?

Julian


> On Jun 27, 2025, at 12:37 AM, suibianwanwan <suibianwanwa...@foxmail.com> 
> wrote:
> 
> I don't recommend implementing this in planner.findBestExp. A planning 
> process may involve multiple Planners and could include custom Shuttles or 
> Programs (like RelFieldTrimmer, RelDecorrelator). We don't need to handle 
> this for all stages, nor attempt to preserve aliases at every stage.
> 
> As long as we ensure aliases are preserved after the Optimize phase 
> completes, that's sufficient.
> 
>> On Jun 27, 2025, at 15:11, Yanjing Wang <zhuangzixiao...@gmail.com> wrote:
>> 
>> Thank you for your detailed response. Given that consistent column naming
>> is a common challenge across organizations using Apache Calcite planner, I
>> believe we should establish a standardized approach. End users expect
>> predictable column aliases in query results, regardless of the optimization
>> process. Your proposed utility method for Calcite is promising. To
>> formalize this solution, I suggest we: 1. Document this as the recommended
>> best practice 2. Integrate it into planner.findBestExp method, which would
>> provide a centralized point for handling column alias preservation This
>> standardization would benefit all Calcite implementations by providing a
>> consistent and reliable way to handle column aliases throughout the
>> optimization process. Julian, Mihai, would you agree with this approach?
>> 
>> suibianwanwan <suibianwanwa...@foxmail.com> 于2025年6月26日周四 18:07写道：
>> 
>>> 1. I think so.
>>> 
>>> 2. In my view, as long as we ensure the top-level Project is restored
>>> after Planner (some Calcite users might output RelNode), it should be fine.
>>> 
>>> 3. RelBuilder#Project will optimize identity nodes. You can set force=true
>>> to force building a Project or directly call LogicalProject#create.
>>> 
>>> I think we can add this utility method in Calcite:
>>> 1. When the top level is a Project, merge the Project to preserve aliases
>>> 2. When the top level is a Sort, call this method on its input
>>> 3. For other cases, directly add a Project to restore aliases
>>> 
>>> On 2025/06/26 07:24:48 Yanjing Wang wrote:
>>>>> 
>>>>> 
>>>>> Dear Julian and Mihai, Thank you both for your detailed and insightful
>>>>> responses. I'd like to confirm my understanding: 1. Regarding column
>>> name
>>>>> preservation approach: - If I understand correctly, using RelRoot to
>>> get a
>>>>> projected rel node of the best rel would be the recommended way to
>>> preserve
>>>>> column names of rel after optimization? 2. About subquery generation:
>>> - I
>>>>> see that subquery generation is controlled by RelToSqlConverter, so I
>>>>> should focus on making adjustments there to control the subquery
>>> generation
>>>>> behavior for Project <- Sort rel pattern. 3. One observation I'd like
>>> to
>>>>> share: - I noticed that when I tried using a rel builder to add a
>>> project
>>>>> to the best rel (specifically when the best rel is a sort), adding a
>>>>> project to the sort input rel doesn't seem to make a difference in the
>>>>> outcome. Could you please confirm if my understanding aligns with your
>>>>> suggestions? This would help ensure I'm moving in the right direction
>>> with
>>>>> the implementation. Best regards, Yanjing
>>>> 
>>> 
>> 
>

Re: [DISCUSS] Preserving Output Alias Names After RelNode Optimization

Reply via email to