Yaliang, I think this is a fantastic idea and I agree about the code maintenance being a cost. I have a concern that creating a smaller project may get abandoned, especially if it had a smaller following. One of the nice things about Heron is the large community and list of core contributors behind it. But, I don't want to abandon this idea. I think, for me at least, that it would make sense to get Storm SQL running in Heron and take what we learned from that experience and apply it to a third part project if there is a need/demand for it. What do you think?
-Josh On Mon, Feb 26, 2018 at 6:51 PM, Yaliang Wang <[email protected]> wrote: > Sounds like a very great feature to have. A question I have: will it be > feasible to start a separate project to support SQL on Heron-like streaming? > > - I’m imaging that there will be a lot code similar/same to Storm SQL. > - Only the last step of the three steps(parse sql -> logical/physical plan > -> heron topology) you mentioned is specified for Heron. The first two > steps can be shared for other heron-like streaming vendors. > - The native support for SQL inside the Heron project will give extra > advertising/marketing bonus but with an increase of the code maintenance > cost, especially, if it requires APIs that not very popular and may be > changed over time. However, a separate project can target a specific > version of Heron. > > Best, > Yaliang > > > On Feb 26, 2018, at 12:48 PM, Eren Avsarogullari < > [email protected]> wrote: > > > > +1 for Heron SQL Support. Thanks Josh. > > > > On 26 February 2018 at 18:42, Karthik Ramasamy <[email protected]> > wrote: > > > >> Thanks Josh for initiating this. It will be a great feature to add for > >> Heron. > >> > >> cheers > >> /karthik > >> > >>> On Feb 26, 2018, at 11:11 AM, Josh Fischer <[email protected]> > wrote: > >>> > >>> Jerry, > >>> > >>> Great point. Lets keep things simple for the migration to make sure > the > >>> implementation is correct. Then we can modify from there. > >>> > >>> On Sun, Feb 25, 2018 at 11:28 PM, Jerry Peng < > >> [email protected]> > >>> wrote: > >>> > >>>> Thanks Josh for taking the initiative to get this start! SQL on Heron > >>>> will be a great feature! The plan sounds great to me. Lets first get > >>>> an initial version of the Heron SQL out and then we can worry about > >>>> custom / user defined sources and sinks. We can even start talking > >>>> about UDFs (User defined functions) at that point! > >>>> > >>>> Best, > >>>> > >>>> Jerry > >>>> > >>>> On Sun, Feb 25, 2018 at 9:05 PM, Josh Fischer <[email protected]> > >> wrote: > >>>>> Please see this google drive link for adding comments. I will copy > and > >>>>> paste the drive doc below as well. > >>>>> > >>>>> https://docs.google.com/document/d/1PxLCyR_H- > >>>> mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing > >>>>> > >>>>> > >>>>> Proposal Below > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> *I am writing this document to propose changes and to start > >> conversations > >>>>> on adding functionality similar to Storm SQL to Heron. We would call > >> it > >>>>> Heron SQL. After reviewing how the code is structured in Storm I > have > >>>> some > >>>>> suggestions and questions relating to the implementation into the > Heron > >>>>> code base. - High Level Overview Of Code Workflow (Keeping Similar to > >>>>> Storm)- We would parse the sql with calcite to create the logical and > >>>>> physical plans- We would then convert the logical and physical plans > >> to a > >>>>> Heron Topology- We would then submit the Heron Topology into the > Heron > >>>>> System - Some thoughts on code structure and overall functionality- I > >>>> think > >>>>> we should place the Heron SQL code base as a top level directory in > the > >>>>> repo. - I will have to add the command “sql” to the Heron command > line > >>>> code > >>>>> in python.- As a first pass implementation users can interact with > >> Heron > >>>>> SQL via the following command - heron sql <sql-file> <topology-name>- > >> We > >>>>> will also support the explain command for displaying the query plan, > >> this > >>>>> will not deploy the topology- heron sql <sql-file> --explain- After > the > >>>>> first pass implementation is working smoothly, we can then add an > >>>>> interactive command line interface to accept sql on the fly by > omitting > >>>> the > >>>>> sql file argument- Heron sql <topology-name>- We would support all of > >> the > >>>>> existing functionality in Storm SQL today with the exception of being > >>>>> dependent on trident. We would use Storm SQL as a way to deploy > >>>> topologies > >>>>> into Heron. Similar to how you deploy topologies with the Streamlet, > >>>>> Topology, and ECO APIs- Questions- Do we see any issue with this plan > >> to > >>>>> implement?- I believe we would have to supply an external jar at > times > >> to > >>>>> connect to external data sources, such as reuse of kafka libraries or > >>>>> database drivers. I see that Storm has few external connectors for > >>>> mongo, > >>>>> kafka, redis and hdfs. Do we want to limit users to what we decide > to > >>>>> build as connectors or do we want to give them the ability to load > >>>> external > >>>>> jars at submit time? I don’t think heron offers the ability to pass > >> extra > >>>>> jars to via the “--jars” or “--artifacts” flags like Storm does > today. > >>>>> Would this be the correct way to pull in external jars? Does anyone > >>>> have a > >>>>> different idea? I’m thinking that this might be a v2 feature after > we > >>>> get > >>>>> Heron sql working well. Ideas, thoughts or concerns?- Is there > >> anything > >>>> I > >>>>> missed?* > >>>> > >> > >> > >
