Re: [Discuss] Make flattening on Struct/Row optional

Stamatis Zampetakis Wed, 11 Dec 2019 22:55:05 -0800

So basically thanks to Igor :)

On Wed, Dec 11, 2019 at 9:56 PM Rui Wang <[email protected]> wrote:


> Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
> support that reconstructs ROW in the top SELECT, which is supposed to solve
> the problem.
>
>
>
> [1]: https://jira.apache.org/jira/browse/CALCITE-3138
>
> On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <[email protected]> wrote:
>
> > Hello,
> >
> > Sorry for the long delay on this thread. Recently I heard about requests
> > on how to deal with STRUCT without flattening it again in BeamSQL. Also I
> > realized Flink has already disabled it in their codebase[1]. I did try to
> > remove STRUCT flattening and run unit tests of calcite core to see how
> many
> > tests breaks: it was 25, which wasn't that bad. So I would like to pick
> up
> > this effort again.
> >
> > Before I do it, I just want to ask if Calcite community supports this
> > effort (or think if it is a good idea)?
> >
> > My current execution plan will be the following:
> > 1. Add a new flag to FrameworkConfig to specify whether flattening
> STRUCT.
> > By default, it is yes.
> > 2. When disabling struct flatterner, add more tests to test STRUCT
> support
> > in general. For example, test STRUCT support on projection, join
> condition,
> > filtering, etc.  If there is something breaks, try to fix it.
> > 3. Check the 25 failed tests above and see why they have failed if struct
> > flattener is gone. Duplicate those failed tests but have necessary fixes
> to
> > make sure they can pass without STRUCT flattening.
> >
> >
> > [1]:
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
> >
> >
> > -Rui
> >
> > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <[email protected]> wrote:
> >
> >> It might not be minor, but it’s worth a try. At optimization time we
> >> treat all fields as fields, regardless of whether they have complex
> types
> >> (maps, arrays, multisets, records) so there should not be too many
> >> problems. The flattening was mainly for the benefit of the runtime.
> >>
> >>
> >> > On Sep 5, 2018, at 11:32 AM, Rui Wang <[email protected]>
> >> wrote:
> >> >
> >> > Thanks for your helpful response! It seems like disabling the
> flattening
> >> > will at least affect some rules in optimization. It might not be a
> minor
> >> > change.
> >> >
> >> >
> >> > -Rui
> >> >
> >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <[email protected]
> >
> >> > wrote:
> >> >
> >> >> Hi Rui,
> >> >>
> >> >> Disabling flattening in some cases seems reasonable.
> >> >>
> >> >> If I am not mistaken, even in the existing code it is not used all
> the
> >> time
> >> >> so it makes sense to become configurable.
> >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
> >> using the
> >> >> flattener only for DDL operations that create materialized views (and
> >> this
> >> >> is because this code at some point passes from the PlannerImpl).
> >> >> On the other hand, any query that is using the Planner will also pass
> >> from
> >> >> the flattener.
> >> >>
> >> >> Disabling the flattener does not mean that all rules will work
> without
> >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
> >> says
> >> >> "This approach has the benefit that real optimizer and codegen rules
> >> never
> >> >> have to deal with structured types.". Due to this, it is very likely
> >> that
> >> >> some rules were written based on the fact that there are no
> structured
> >> >> types.
> >> >>
> >> >> Best,
> >> >> Stamatis
> >> >>
> >> >>
> >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <
> [email protected]
> >> >
> >> >> έγραψε:
> >> >>
> >> >>> Flattening was introduced mainly because the original engine used
> flat
> >> >>> column-oriented storage. Now we have several ways to executing,
> >> >>> including generating java code.
> >> >>>
> >> >>> Adding a mode to disable flattening might make sense.
> >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <[email protected]
> >
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi Community,
> >> >>>>
> >> >>>> While trying to support Row type in Apache Beam SQL on top of
> >> Calcite,
> >> >> I
> >> >>>> realized flattening Row logic will make structure information of
> Row
> >> >> lost
> >> >>>> after Projections. There is a use case where users want to mix Beam
> >> >>>> programming model with Beam SQL together to process a dataset. The
> >> >>>> following is an example of the use case:
> >> >>>>
> >> >>>> dataset.apply(something user defined)
> >> >>>>            .apply(SELECT ...)
> >> >>>>            .apply(something user defined)
> >> >>>>
> >> >>>> As you can see, after the SQL statement is applied, the data
> >> structure
> >> >>>> should be preserved for further processing.
> >> >>>>
> >> >>>> The most straightforward way to me is to make Struct fattening
> >> optional
> >> >>> so
> >> >>>> I could choose to disable it and the Row structure is preserved.
> Can
> >> I
> >> >>> ask
> >> >>>> if it is feasible to make it happen? What could happen if Calcite
> >> just
> >> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had
> >> >>>> exceptions in optimizer. I wasn't sure if that were some minor
> thing
> >> to
> >> >>> fix
> >> >>>> or Struct flattening was a design choice so the impact of change
> was
> >> >>> huge)
> >> >>>>
> >> >>>> Additionally, if there is a way to keep the information that I can
> >> use
> >> >> to
> >> >>>> reconstruct the Row after projections, it might be ok as well. Does
> >> >> this
> >> >>>> idea exist in Calcite? If it does not exist, how is this idea
> >> compared
> >> >>> with
> >> >>>> disabling Struct flattening?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Rui
> >> >>>
> >> >>
> >>
> >>
>

Re: [Discuss] Make flattening on Struct/Row optional

Reply via email to