So basically thanks to Igor :) On Wed, Dec 11, 2019 at 9:56 PM Rui Wang <[email protected]> wrote:
> Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the > support that reconstructs ROW in the top SELECT, which is supposed to solve > the problem. > > > > [1]: https://jira.apache.org/jira/browse/CALCITE-3138 > > On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <[email protected]> wrote: > > > Hello, > > > > Sorry for the long delay on this thread. Recently I heard about requests > > on how to deal with STRUCT without flattening it again in BeamSQL. Also I > > realized Flink has already disabled it in their codebase[1]. I did try to > > remove STRUCT flattening and run unit tests of calcite core to see how > many > > tests breaks: it was 25, which wasn't that bad. So I would like to pick > up > > this effort again. > > > > Before I do it, I just want to ask if Calcite community supports this > > effort (or think if it is a good idea)? > > > > My current execution plan will be the following: > > 1. Add a new flag to FrameworkConfig to specify whether flattening > STRUCT. > > By default, it is yes. > > 2. When disabling struct flatterner, add more tests to test STRUCT > support > > in general. For example, test STRUCT support on projection, join > condition, > > filtering, etc. If there is something breaks, try to fix it. > > 3. Check the 25 failed tests above and see why they have failed if struct > > flattener is gone. Duplicate those failed tests but have necessary fixes > to > > make sure they can pass without STRUCT flattening. > > > > > > [1]: > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166 > > > > > > -Rui > > > > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <[email protected]> wrote: > > > >> It might not be minor, but it’s worth a try. At optimization time we > >> treat all fields as fields, regardless of whether they have complex > types > >> (maps, arrays, multisets, records) so there should not be too many > >> problems. The flattening was mainly for the benefit of the runtime. > >> > >> > >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <[email protected]> > >> wrote: > >> > > >> > Thanks for your helpful response! It seems like disabling the > flattening > >> > will at least affect some rules in optimization. It might not be a > minor > >> > change. > >> > > >> > > >> > -Rui > >> > > >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <[email protected] > > > >> > wrote: > >> > > >> >> Hi Rui, > >> >> > >> >> Disabling flattening in some cases seems reasonable. > >> >> > >> >> If I am not mistaken, even in the existing code it is not used all > the > >> time > >> >> so it makes sense to become configurable. > >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are > >> using the > >> >> flattener only for DDL operations that create materialized views (and > >> this > >> >> is because this code at some point passes from the PlannerImpl). > >> >> On the other hand, any query that is using the Planner will also pass > >> from > >> >> the flattener. > >> >> > >> >> Disabling the flattener does not mean that all rules will work > without > >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point > >> says > >> >> "This approach has the benefit that real optimizer and codegen rules > >> never > >> >> have to deal with structured types.". Due to this, it is very likely > >> that > >> >> some rules were written based on the fact that there are no > structured > >> >> types. > >> >> > >> >> Best, > >> >> Stamatis > >> >> > >> >> > >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde < > [email protected] > >> > > >> >> έγραψε: > >> >> > >> >>> Flattening was introduced mainly because the original engine used > flat > >> >>> column-oriented storage. Now we have several ways to executing, > >> >>> including generating java code. > >> >>> > >> >>> Adding a mode to disable flattening might make sense. > >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <[email protected] > > > >> >>> wrote: > >> >>>> > >> >>>> Hi Community, > >> >>>> > >> >>>> While trying to support Row type in Apache Beam SQL on top of > >> Calcite, > >> >> I > >> >>>> realized flattening Row logic will make structure information of > Row > >> >> lost > >> >>>> after Projections. There is a use case where users want to mix Beam > >> >>>> programming model with Beam SQL together to process a dataset. The > >> >>>> following is an example of the use case: > >> >>>> > >> >>>> dataset.apply(something user defined) > >> >>>> .apply(SELECT ...) > >> >>>> .apply(something user defined) > >> >>>> > >> >>>> As you can see, after the SQL statement is applied, the data > >> structure > >> >>>> should be preserved for further processing. > >> >>>> > >> >>>> The most straightforward way to me is to make Struct fattening > >> optional > >> >>> so > >> >>>> I could choose to disable it and the Row structure is preserved. > Can > >> I > >> >>> ask > >> >>>> if it is feasible to make it happen? What could happen if Calcite > >> just > >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had > >> >>>> exceptions in optimizer. I wasn't sure if that were some minor > thing > >> to > >> >>> fix > >> >>>> or Struct flattening was a design choice so the impact of change > was > >> >>> huge) > >> >>>> > >> >>>> Additionally, if there is a way to keep the information that I can > >> use > >> >> to > >> >>>> reconstruct the Row after projections, it might be ok as well. Does > >> >> this > >> >>>> idea exist in Calcite? If it does not exist, how is this idea > >> compared > >> >>> with > >> >>>> disabling Struct flattening? > >> >>>> > >> >>>> Thanks, > >> >>>> Rui > >> >>> > >> >> > >> > >> >
