waynexia commented on pull request #792: URL: https://github.com/apache/arrow-datafusion/pull/792#issuecomment-901203114
Sorry for the late update. I'm afraid I got some problems when trying to fix the last two failed cases ([Logs here][1]) . They failed in the same place where the result table's column name doesn't match. The optimized will produce a column with qualified name which isn't expected. Like `test.c1 PLUS test.c2` but the expected is `c1 PLUS c2`. When performing common subexpression eliminate optimization, the common expression will be replaced by a single `Column` expression which references to the pre-calculated column in the input plan. This `Column`'s name is a generated "identifier" that is used to identify common subexpressions. And in many other places we will use an expr's name generated by `Expr::name()`. Like in the projection push-down optimizer, we use expr's name to represent a column. Therefore there should have an `Alias` expr after the `Column` expr, or we will get a strange string (the "identifier") when trying to naming the rewrote expr. Everything works as expected so far. It becomes wired when comes to physical plan phase. In this phase, `Expr` will be converted to `PhysicalExpr` with a new name generated by `physical_name()`. This new name will become the result column's name, like `c1 PLUS c2` mentioned before. But for those who are rewrote, the expr is just a `col().alias()`, not a real expr. Its physical name is directly obtained from the alias, which is generated by `Expr::name()`. And for the root cause here, `Expr::name()` gives `Column` expr flat name (`test.c1`) and `physical_name()` gives unqualified name (`c1`). Finally we got `test.c1 PLUS test.c2` rather than `c1 PLUS c2`. I've tried a few ways to work around this but they end up breaking other parts. I haven't come up with an elegant solution so far. It's really appreciated if someone could put up some discussion on this. [1]: https://github.com/apache/arrow-datafusion/pull/792/checks?check_run_id=3252296013#step:4:1370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
