Re: What's the root cause of not supporting multiple aggregations in structured streaming?

Etienne Chauchot Fri, 04 Sep 2020 01:56:14 -0700

Hi Jungtaek Lim,

Nice to hear from you again since last time we talked :) and congrats onbecoming a Spark committer in the meantime ! (if I'm not mistaking youwere not at the time)

I totally agree with what you're saying on merging structural parts ofSpark without having a broader consensus. What I don't understand is whythere is not more investment in SS. Especially because in another threadthe community is discussing about deprecating the regular DStreamstreaming framework.


Is the orientation of Spark now mostly batch ?

PS: yeah I saw your update on the doc when I took a look at 3.0 preview2 searching for this particular feature. And regarding the workaround,I'm not sure it meets my needs as it will add delays and also may messup with watermarks.


Best

Etienne Chauchot


On 04/09/2020 08:06, Jungtaek Lim wrote:

Unfortunately I don't see enough active committers working onStructured Streaming; I don't expect major features/improvements canbe brought in this situation.

Technically I can review and merge the PR on major improvements in SS,but that depends on how huge the proposal is changing. If the proposalbrings conceptual change, being reviewed by a committer wouldn't stillbe enough.

So that's not due to the fact we think it's worthless. (That might beonly me though.) I'd understand as there's not much investment on SS.There's also a known workaround for multiple aggregations (I'vedocumented in the SS guide doc, in "Limitation of global watermark"section), though I totally agree the workaround is bad.

On Tue, Sep 1, 2020 at 12:28 AM Etienne Chauchot <[email protected]<mailto:[email protected]>> wrote:


    Hi all,

    I'm also very interested in this feature but the PR is open since
    January 2019 and was not updated. It raised a design discussion
    around watermarks and a design doc was written
    
(https://docs.google.com/document/d/1IAH9UQJPUiUCLd7H6dazRK2k1szDX38SnM6GVNZYvUo/edit#heading=h.npkueh4bbkz1).
    We also commented this design but no matter what it seems that the
    subject is still stale.

    Is there any interest in the community in delivering this feature
    or is it considered worthless ? If the latter, can you explain why ?

    Best

    Etienne

    On 22/05/2019 03:38, 张万新 wrote:

    Thanks, I'll check it out.

    Arun Mahadevan <[email protected] <mailto:[email protected]>> 于
    2019年5月21日周二 01:31写道：

        Heres the proposal for supporting it in "append" mode -
        https://github.com/apache/spark/pull/23576. You could see if
        it addresses your requirement and post your feedback in the PR.
        For "update" mode its going to be much harder to support this
        without first adding support for "retractions", otherwise we
        would end up with wrong results.

        - Arun


        On Mon, 20 May 2019 at 01:34, Gabor Somogyi
        <[email protected]
        <mailto:[email protected]>> wrote:

            There is PR for this but not yet merged.

            On Mon, May 20, 2019 at 10:13 AM 张万新
            <[email protected] <mailto:[email protected]>>
            wrote:

                Hi there,

                I'd like to know what's the root reason why multiple
                aggregations on streaming dataframe is not allowed
                since it's a very useful feature, and flink has
                supported it for a long time.

                Thanks.

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

Reply via email to