Re: Proposing Changes To Heron

Josh Fischer Mon, 26 Feb 2018 19:18:08 -0800

Yaliang,

I think this is a fantastic idea and I agree about the code maintenance
being a cost.   I have a concern that creating a smaller project may get
abandoned, especially if it had a smaller following.   One of the nice
things about Heron is the large community and list of core contributors
behind it.  But, I don't want to abandon this idea.  I think, for me at
least, that it would make sense to get Storm SQL running in Heron and take
what we learned from that experience and apply it to a third part project
if there is a need/demand for it.  What do you think?


-Josh

On Mon, Feb 26, 2018 at 6:51 PM, Yaliang Wang <[email protected]>
wrote:

> Sounds like a very great feature to have. A question I have: will it be
> feasible to start a separate project to support SQL on Heron-like streaming?
>
> - I’m imaging that there will be a lot code similar/same to Storm SQL.
> - Only the last step of the three steps(parse sql -> logical/physical plan
> -> heron topology) you mentioned is specified for Heron. The first two
> steps can be shared for other heron-like streaming vendors.
> - The native support for SQL inside the Heron project will give extra
> advertising/marketing bonus but with an increase of the code maintenance
> cost, especially, if it requires APIs that not very popular and may be
> changed over time. However, a separate project can target a specific
> version of Heron.
>
> Best,
> Yaliang
>
> > On Feb 26, 2018, at 12:48 PM, Eren Avsarogullari <
> [email protected]> wrote:
> >
> > +1 for Heron SQL Support. Thanks Josh.
> >
> > On 26 February 2018 at 18:42, Karthik Ramasamy <[email protected]>
> wrote:
> >
> >> Thanks Josh for initiating this. It will be a great feature to add for
> >> Heron.
> >>
> >> cheers
> >> /karthik
> >>
> >>> On Feb 26, 2018, at 11:11 AM, Josh Fischer <[email protected]>
> wrote:
> >>>
> >>> Jerry,
> >>>
> >>> Great point.  Lets keep things simple for the migration to make sure
> the
> >>> implementation is correct.  Then we can modify from there.
> >>>
> >>> On Sun, Feb 25, 2018 at 11:28 PM, Jerry Peng <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>> Thanks Josh for taking the initiative to get this start!  SQL on Heron
> >>>> will be a great feature! The plan sounds great to me.  Lets first get
> >>>> an initial version of the Heron SQL out and then we can worry about
> >>>> custom / user defined sources and sinks.  We can even start talking
> >>>> about UDFs (User defined functions) at that point!
> >>>>
> >>>> Best,
> >>>>
> >>>> Jerry
> >>>>
> >>>> On Sun, Feb 25, 2018 at 9:05 PM, Josh Fischer <[email protected]>
> >> wrote:
> >>>>> Please see this google drive link for adding comments.  I will copy
> and
> >>>>> paste the drive doc below as well.
> >>>>>
> >>>>> https://docs.google.com/document/d/1PxLCyR_H-
> >>>> mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing
> >>>>>
> >>>>>
> >>>>> Proposal Below
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> *I am writing this document to propose changes and to start
> >> conversations
> >>>>> on adding functionality similar to Storm SQL to Heron.  We would call
> >> it
> >>>>> Heron SQL.  After reviewing how the code is structured in Storm I
> have
> >>>> some
> >>>>> suggestions and questions relating to the implementation into the
> Heron
> >>>>> code base. - High Level Overview Of Code Workflow (Keeping Similar to
> >>>>> Storm)- We would parse the sql with calcite to create the logical and
> >>>>> physical plans- We would then convert the logical and physical plans
> >> to a
> >>>>> Heron Topology- We would then submit the Heron Topology into the
> Heron
> >>>>> System - Some thoughts on code structure and overall functionality- I
> >>>> think
> >>>>> we should place the Heron SQL code base as a top level directory in
> the
> >>>>> repo. - I will have to add the command “sql” to the Heron command
> line
> >>>> code
> >>>>> in python.- As a first pass implementation users  can interact with
> >> Heron
> >>>>> SQL via the following command - heron sql <sql-file> <topology-name>-
> >> We
> >>>>> will also support the explain command for displaying the query plan,
> >> this
> >>>>> will not deploy the topology- heron sql <sql-file> --explain- After
> the
> >>>>> first pass implementation is working smoothly, we can then add an
> >>>>> interactive command line interface to accept sql on the fly by
> omitting
> >>>> the
> >>>>> sql file argument- Heron sql <topology-name>- We would support all of
> >> the
> >>>>> existing functionality in Storm SQL today with the exception of being
> >>>>> dependent on trident.  We would use Storm SQL as a way to deploy
> >>>> topologies
> >>>>> into Heron.  Similar to how you deploy topologies with the Streamlet,
> >>>>> Topology, and ECO APIs- Questions- Do we see any issue with this plan
> >> to
> >>>>> implement?- I believe we would have to supply an external jar at
> times
> >> to
> >>>>> connect to external data sources, such as reuse of kafka libraries or
> >>>>> database drivers.  I see that Storm has few external connectors for
> >>>> mongo,
> >>>>> kafka, redis and hdfs.  Do we want to limit users to what we decide
> to
> >>>>> build as connectors or do we want to give them the ability to load
> >>>> external
> >>>>> jars at submit time? I don’t think heron offers the ability to pass
> >> extra
> >>>>> jars to via the “--jars” or “--artifacts” flags like Storm does
> today.
> >>>>> Would this be the correct way to pull in external jars?  Does anyone
> >>>> have a
> >>>>> different idea?  I’m thinking that this might be a v2 feature after
> we
> >>>> get
> >>>>> Heron sql working well.  Ideas, thoughts or concerns?- Is there
> >> anything
> >>>> I
> >>>>> missed?*
> >>>>
> >>
> >>
>
>

Re: Proposing Changes To Heron

Reply via email to