Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the support that reconstructs ROW in the top SELECT, which is supposed to solve the problem.
[1]: https://jira.apache.org/jira/browse/CALCITE-3138 On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <amaliu...@apache.org> wrote: > Hello, > > Sorry for the long delay on this thread. Recently I heard about requests > on how to deal with STRUCT without flattening it again in BeamSQL. Also I > realized Flink has already disabled it in their codebase[1]. I did try to > remove STRUCT flattening and run unit tests of calcite core to see how many > tests breaks: it was 25, which wasn't that bad. So I would like to pick up > this effort again. > > Before I do it, I just want to ask if Calcite community supports this > effort (or think if it is a good idea)? > > My current execution plan will be the following: > 1. Add a new flag to FrameworkConfig to specify whether flattening STRUCT. > By default, it is yes. > 2. When disabling struct flatterner, add more tests to test STRUCT support > in general. For example, test STRUCT support on projection, join condition, > filtering, etc. If there is something breaks, try to fix it. > 3. Check the 25 failed tests above and see why they have failed if struct > flattener is gone. Duplicate those failed tests but have necessary fixes to > make sure they can pass without STRUCT flattening. > > > [1]: > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166 > > > -Rui > > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote: > >> It might not be minor, but it’s worth a try. At optimization time we >> treat all fields as fields, regardless of whether they have complex types >> (maps, arrays, multisets, records) so there should not be too many >> problems. The flattening was mainly for the benefit of the runtime. >> >> >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ruw...@google.com.INVALID> >> wrote: >> > >> > Thanks for your helpful response! It seems like disabling the flattening >> > will at least affect some rules in optimization. It might not be a minor >> > change. >> > >> > >> > -Rui >> > >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <zabe...@gmail.com> >> > wrote: >> > >> >> Hi Rui, >> >> >> >> Disabling flattening in some cases seems reasonable. >> >> >> >> If I am not mistaken, even in the existing code it is not used all the >> time >> >> so it makes sense to become configurable. >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are >> using the >> >> flattener only for DDL operations that create materialized views (and >> this >> >> is because this code at some point passes from the PlannerImpl). >> >> On the other hand, any query that is using the Planner will also pass >> from >> >> the flattener. >> >> >> >> Disabling the flattener does not mean that all rules will work without >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point >> says >> >> "This approach has the benefit that real optimizer and codegen rules >> never >> >> have to deal with structured types.". Due to this, it is very likely >> that >> >> some rules were written based on the fact that there are no structured >> >> types. >> >> >> >> Best, >> >> Stamatis >> >> >> >> >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org >> > >> >> έγραψε: >> >> >> >>> Flattening was introduced mainly because the original engine used flat >> >>> column-oriented storage. Now we have several ways to executing, >> >>> including generating java code. >> >>> >> >>> Adding a mode to disable flattening might make sense. >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ruw...@google.com.invalid> >> >>> wrote: >> >>>> >> >>>> Hi Community, >> >>>> >> >>>> While trying to support Row type in Apache Beam SQL on top of >> Calcite, >> >> I >> >>>> realized flattening Row logic will make structure information of Row >> >> lost >> >>>> after Projections. There is a use case where users want to mix Beam >> >>>> programming model with Beam SQL together to process a dataset. The >> >>>> following is an example of the use case: >> >>>> >> >>>> dataset.apply(something user defined) >> >>>> .apply(SELECT ...) >> >>>> .apply(something user defined) >> >>>> >> >>>> As you can see, after the SQL statement is applied, the data >> structure >> >>>> should be preserved for further processing. >> >>>> >> >>>> The most straightforward way to me is to make Struct fattening >> optional >> >>> so >> >>>> I could choose to disable it and the Row structure is preserved. Can >> I >> >>> ask >> >>>> if it is feasible to make it happen? What could happen if Calcite >> just >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had >> >>>> exceptions in optimizer. I wasn't sure if that were some minor thing >> to >> >>> fix >> >>>> or Struct flattening was a design choice so the impact of change was >> >>> huge) >> >>>> >> >>>> Additionally, if there is a way to keep the information that I can >> use >> >> to >> >>>> reconstruct the Row after projections, it might be ok as well. Does >> >> this >> >>>> idea exist in Calcite? If it does not exist, how is this idea >> compared >> >>> with >> >>>> disabling Struct flattening? >> >>>> >> >>>> Thanks, >> >>>> Rui >> >>> >> >> >> >>