Re: [Discuss] Make flattening on Struct/Row optional

Rui Wang Wed, 11 Dec 2019 12:57:13 -0800

Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
support that reconstructs ROW in the top SELECT, which is supposed to solve
the problem.




[1]: https://jira.apache.org/jira/browse/CALCITE-3138

On Mon, Dec 9, 2019 at 3:21 PM Rui Wang <amaliu...@apache.org> wrote:

> Hello,
>
> Sorry for the long delay on this thread. Recently I heard about requests
> on how to deal with STRUCT without flattening it again in BeamSQL. Also I
> realized Flink has already disabled it in their codebase[1]. I did try to
> remove STRUCT flattening and run unit tests of calcite core to see how many
> tests breaks: it was 25, which wasn't that bad. So I would like to pick up
> this effort again.
>
> Before I do it, I just want to ask if Calcite community supports this
> effort (or think if it is a good idea)?
>
> My current execution plan will be the following:
> 1. Add a new flag to FrameworkConfig to specify whether flattening STRUCT.
> By default, it is yes.
> 2. When disabling struct flatterner, add more tests to test STRUCT support
> in general. For example, test STRUCT support on projection, join condition,
> filtering, etc.  If there is something breaks, try to fix it.
> 3. Check the 25 failed tests above and see why they have failed if struct
> flattener is gone. Duplicate those failed tests but have necessary fixes to
> make sure they can pass without STRUCT flattening.
>
>
> [1]:
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
>
>
> -Rui
>
> On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde <jh...@apache.org> wrote:
>
>> It might not be minor, but it’s worth a try. At optimization time we
>> treat all fields as fields, regardless of whether they have complex types
>> (maps, arrays, multisets, records) so there should not be too many
>> problems. The flattening was mainly for the benefit of the runtime.
>>
>>
>> > On Sep 5, 2018, at 11:32 AM, Rui Wang <ruw...@google.com.INVALID>
>> wrote:
>> >
>> > Thanks for your helpful response! It seems like disabling the flattening
>> > will at least affect some rules in optimization. It might not be a minor
>> > change.
>> >
>> >
>> > -Rui
>> >
>> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <zabe...@gmail.com>
>> > wrote:
>> >
>> >> Hi Rui,
>> >>
>> >> Disabling flattening in some cases seems reasonable.
>> >>
>> >> If I am not mistaken, even in the existing code it is not used all the
>> time
>> >> so it makes sense to become configurable.
>> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
>> using the
>> >> flattener only for DDL operations that create materialized views (and
>> this
>> >> is because this code at some point passes from the PlannerImpl).
>> >> On the other hand, any query that is using the Planner will also pass
>> from
>> >> the flattener.
>> >>
>> >> Disabling the flattener does not mean that all rules will work without
>> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
>> says
>> >> "This approach has the benefit that real optimizer and codegen rules
>> never
>> >> have to deal with structured types.". Due to this, it is very likely
>> that
>> >> some rules were written based on the fact that there are no structured
>> >> types.
>> >>
>> >> Best,
>> >> Stamatis
>> >>
>> >>
>> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org
>> >
>> >> έγραψε:
>> >>
>> >>> Flattening was introduced mainly because the original engine used flat
>> >>> column-oriented storage. Now we have several ways to executing,
>> >>> including generating java code.
>> >>>
>> >>> Adding a mode to disable flattening might make sense.
>> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ruw...@google.com.invalid>
>> >>> wrote:
>> >>>>
>> >>>> Hi Community,
>> >>>>
>> >>>> While trying to support Row type in Apache Beam SQL on top of
>> Calcite,
>> >> I
>> >>>> realized flattening Row logic will make structure information of Row
>> >> lost
>> >>>> after Projections. There is a use case where users want to mix Beam
>> >>>> programming model with Beam SQL together to process a dataset. The
>> >>>> following is an example of the use case:
>> >>>>
>> >>>> dataset.apply(something user defined)
>> >>>>            .apply(SELECT ...)
>> >>>>            .apply(something user defined)
>> >>>>
>> >>>> As you can see, after the SQL statement is applied, the data
>> structure
>> >>>> should be preserved for further processing.
>> >>>>
>> >>>> The most straightforward way to me is to make Struct fattening
>> optional
>> >>> so
>> >>>> I could choose to disable it and the Row structure is preserved. Can
>> I
>> >>> ask
>> >>>> if it is feasible to make it happen? What could happen if Calcite
>> just
>> >>>> doesn't flatten Struct in flattener? (I tried to disable it but had
>> >>>> exceptions in optimizer. I wasn't sure if that were some minor thing
>> to
>> >>> fix
>> >>>> or Struct flattening was a design choice so the impact of change was
>> >>> huge)
>> >>>>
>> >>>> Additionally, if there is a way to keep the information that I can
>> use
>> >> to
>> >>>> reconstruct the Row after projections, it might be ok as well. Does
>> >> this
>> >>>> idea exist in Calcite? If it does not exist, how is this idea
>> compared
>> >>> with
>> >>>> disabling Struct flattening?
>> >>>>
>> >>>> Thanks,
>> >>>> Rui
>> >>>
>> >>
>>
>>

Re: [Discuss] Make flattening on Struct/Row optional

Reply via email to