Sounds like a very great feature to have. A question I have: will it be feasible to start a separate project to support SQL on Heron-like streaming?
- I’m imaging that there will be a lot code similar/same to Storm SQL. - Only the last step of the three steps(parse sql -> logical/physical plan -> heron topology) you mentioned is specified for Heron. The first two steps can be shared for other heron-like streaming vendors. - The native support for SQL inside the Heron project will give extra advertising/marketing bonus but with an increase of the code maintenance cost, especially, if it requires APIs that not very popular and may be changed over time. However, a separate project can target a specific version of Heron. Best, Yaliang > On Feb 26, 2018, at 12:48 PM, Eren Avsarogullari > <[email protected]> wrote: > > +1 for Heron SQL Support. Thanks Josh. > > On 26 February 2018 at 18:42, Karthik Ramasamy <[email protected]> wrote: > >> Thanks Josh for initiating this. It will be a great feature to add for >> Heron. >> >> cheers >> /karthik >> >>> On Feb 26, 2018, at 11:11 AM, Josh Fischer <[email protected]> wrote: >>> >>> Jerry, >>> >>> Great point. Lets keep things simple for the migration to make sure the >>> implementation is correct. Then we can modify from there. >>> >>> On Sun, Feb 25, 2018 at 11:28 PM, Jerry Peng < >> [email protected]> >>> wrote: >>> >>>> Thanks Josh for taking the initiative to get this start! SQL on Heron >>>> will be a great feature! The plan sounds great to me. Lets first get >>>> an initial version of the Heron SQL out and then we can worry about >>>> custom / user defined sources and sinks. We can even start talking >>>> about UDFs (User defined functions) at that point! >>>> >>>> Best, >>>> >>>> Jerry >>>> >>>> On Sun, Feb 25, 2018 at 9:05 PM, Josh Fischer <[email protected]> >> wrote: >>>>> Please see this google drive link for adding comments. I will copy and >>>>> paste the drive doc below as well. >>>>> >>>>> https://docs.google.com/document/d/1PxLCyR_H- >>>> mOgPjyFj3DhWXryKW21CH2zFWwzTnqjfEA/edit?usp=sharing >>>>> >>>>> >>>>> Proposal Below >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *I am writing this document to propose changes and to start >> conversations >>>>> on adding functionality similar to Storm SQL to Heron. We would call >> it >>>>> Heron SQL. After reviewing how the code is structured in Storm I have >>>> some >>>>> suggestions and questions relating to the implementation into the Heron >>>>> code base. - High Level Overview Of Code Workflow (Keeping Similar to >>>>> Storm)- We would parse the sql with calcite to create the logical and >>>>> physical plans- We would then convert the logical and physical plans >> to a >>>>> Heron Topology- We would then submit the Heron Topology into the Heron >>>>> System - Some thoughts on code structure and overall functionality- I >>>> think >>>>> we should place the Heron SQL code base as a top level directory in the >>>>> repo. - I will have to add the command “sql” to the Heron command line >>>> code >>>>> in python.- As a first pass implementation users can interact with >> Heron >>>>> SQL via the following command - heron sql <sql-file> <topology-name>- >> We >>>>> will also support the explain command for displaying the query plan, >> this >>>>> will not deploy the topology- heron sql <sql-file> --explain- After the >>>>> first pass implementation is working smoothly, we can then add an >>>>> interactive command line interface to accept sql on the fly by omitting >>>> the >>>>> sql file argument- Heron sql <topology-name>- We would support all of >> the >>>>> existing functionality in Storm SQL today with the exception of being >>>>> dependent on trident. We would use Storm SQL as a way to deploy >>>> topologies >>>>> into Heron. Similar to how you deploy topologies with the Streamlet, >>>>> Topology, and ECO APIs- Questions- Do we see any issue with this plan >> to >>>>> implement?- I believe we would have to supply an external jar at times >> to >>>>> connect to external data sources, such as reuse of kafka libraries or >>>>> database drivers. I see that Storm has few external connectors for >>>> mongo, >>>>> kafka, redis and hdfs. Do we want to limit users to what we decide to >>>>> build as connectors or do we want to give them the ability to load >>>> external >>>>> jars at submit time? I don’t think heron offers the ability to pass >> extra >>>>> jars to via the “--jars” or “--artifacts” flags like Storm does today. >>>>> Would this be the correct way to pull in external jars? Does anyone >>>> have a >>>>> different idea? I’m thinking that this might be a v2 feature after we >>>> get >>>>> Heron sql working well. Ideas, thoughts or concerns?- Is there >> anything >>>> I >>>>> missed?* >>>> >> >>
