waynexia commented on pull request #792:
URL: https://github.com/apache/arrow-datafusion/pull/792#issuecomment-901203114


   Sorry for the late update. I'm afraid I got some problems when trying to fix 
the last two failed cases ([Logs here][1]) . 
   
   They failed in the same place where the result table's column name doesn't 
match. The optimized will produce a column with qualified name which isn't 
expected. Like `test.c1 PLUS test.c2` but the expected is `c1 PLUS c2`.
   
   When performing common subexpression eliminate optimization, the common 
expression will be replaced by a single `Column` expression which references to 
the pre-calculated column in the input plan. This `Column`'s name is a 
generated "identifier" that is used to identify common subexpressions. 
   
   And in many other places we will use an expr's name generated by 
`Expr::name()`. Like in the projection push-down optimizer, we use expr's name 
to represent a column. Therefore there should have an `Alias` expr after the 
`Column` expr, or we will get a strange string (the "identifier") when trying 
to naming the rewrote expr.
   
   Everything works as expected so far. It becomes wired when comes to physical 
plan phase. In this phase, `Expr` will be converted to `PhysicalExpr` with a 
new name generated by `physical_name()`. This new name will become the result 
column's name, like `c1 PLUS c2` mentioned before. But for those who are 
rewrote, the expr is just a `col().alias()`, not a real expr. Its physical name 
is directly obtained from the alias, which is generated by `Expr::name()`.
   
   And for the root cause here, `Expr::name()` gives `Column` expr flat name 
(`test.c1`) and `physical_name()` gives unqualified name (`c1`). Finally we got 
`test.c1 PLUS test.c2` rather than `c1 PLUS c2`.
   
   I've tried a few ways to work around this but they end up breaking other 
parts. I haven't come up with an elegant solution so far. It's really 
appreciated if someone could put up some discussion on this.
   
   [1]: 
https://github.com/apache/arrow-datafusion/pull/792/checks?check_run_id=3252296013#step:4:1370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to