For preventing this discussion to be staled, I'd like to put a note how Storm SQL is going now. (including the change after my tech. analysis)
1. There're 6 pull requests opened regarding Storm SQL. https://github.com/apache/storm/pulls/HeartSaVioR 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged to master, Storm SQL can handle most of things which Calcite publishes to SQL reference page. I wrote Storm SQL's own reference page <https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6a314910437/docs/storm-sql-reference.md> and submitted it to another pull request (one of six pull requests) STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in order to apply bugfix. (Yes, someone can state that it's a bit tightly coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it doesn't happen in next week, we can just stick with Calcite 1.9.0 and immediately upgrade 1.10.0 when it's available. 3. During adding tests I found some of bugs on Calcite side (left notes for each test and also reference doc), which we can contribute back to Calcite community. I think this would be a positive feedback loop between Storm and Calcite project. There're still many rooms available to work on Storm SQL. 1. I haven't had much time to do now, but would like to learn Calcite and try to optimize Storm SQL via STORM-1446 <https://issues.apache.org/jira/browse/STORM-1446>. If someone who understands Calcite well takes over and contributes this area I'd be much appreciated. 2. I think Trident seems not good backend API for Storm SQL for a long term. Trident doesn't support typed operation, join and aggregation is limited to micro-batch so result is non deterministic. I'm waiting for higher-level API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the start) and plan to move the backend API. 3. Storm SQL needs to have more connectors as data sources. It isn't hard to work on, but a bit time-consuming. 4. Storm SQL needs to have time to stabilize by experiment loop: experimenting from early adopters, reporting, and fixing bugs. 5. Long term work: we can expand supporting features on Storm SQL. Join between streaming data source and normal table should help to enrich or filter data with other external data source. There's also continuous effort from Calcite community: 'Streaming SQL' led by Julian. etc. Looking forward to continue discussion with JW Player folks. Thanks, Jungtaek Lim (HeartSaVioR) ps. I'd really like to make revamped Storm SQL available to the community. Do we think merge process can pause this effort? I hope not, but I can follow the community's decision. 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <[email protected]>님이 작성: FYI, I’ve merged the SQE code and documentation into the sqe_import branch: https://github.com/apache/storm/tree/sqe_import Note the build will fail on the last component (storm-sqe) due to the compilation issues mentioned earlier. -Taylor On Sep 29, 2016, at 11:13 AM, Bobby Evans <[email protected]> wrote: Agreed, or if we can find a way to not break compatibility (like perhaps having a common base class for most of the logic and one subclass that uses a string while another that uses the byte array). - Bobby On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <[email protected]> wrote: The change on storm-redis seems to require breaking backward compatibility, so I would love to see another pull request to integrate it via general review process. If it can be integrated to SQE without changing storm-redis, that would be nice. Does it make sense? - Jungtaek Lim (HeartSaVioR) 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <[email protected]>님이 작성: The changes are available in GitHub, I just overlooked it. And they’re actually neatly contained in two commits: The changes to storm-kafka can be found here: https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a The changes to storm-redis are here: https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8 < https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a > The changes to storm-kafka are straightforward and implemented in such a way that they would be useful for use cases outside of SQE. As the commit message states, it adds a new kafka deserialization scheme (FullScheme) that includes the key, value, topic, partition and offset when reading from kafka, which is a feature I can see as being valuable for some use cases. I would be +1 for merging that code. The changes to storm-redis are a little different, as Morrigan pointed out, because it only addresses the Trident API, but IMHO it looks like a good direction. @HeartSavior — Would you have some time to take a look at the storm-redis changes and provide your opinion, since you’re one of the original authors of that code? -Taylor On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <[email protected]> wrote: Great! For storm-redis, we might need to modify key/value mapper to use byte[] rather than String. When I co-authored storm-redis, I forgot considering binary format of serde. If we want to address that part, we can also address it. 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <[email protected]>님이 작성: Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one will require more work to make it more complete. On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <[email protected]> wrote: Thanks for the explanation Morrigan! Would you be willing to provide a pull request or patch so the community can review? It sounds like at least some of the changes you mention could be useful to the broader community (beyond the SQL effort). Thanks again, -Taylor On Sep 26, 2016, at 4:40 PM, Morrigan Jones <[email protected]> wrote: storm-kafka - This is needed because storm-kafka does not provide a scheme class that gives you the key, value (payload), partition, and offset. MessageMetadataScheme.java comes comes closest, but is missing the key. This was a pretty simple change on my part. storm-redis - This is needed for proper support of Redis hashes. The existing storm-redis uses a static string (additionalKey in the RedisDataTypeDescription class) for the field name in hash types. I updated it to use a configurable KeyFactory for both the hash name and the field name. We also added some limited support for set types. This is admittedly the messiest between the two jars since we only cared about the trident states and would require a lot more changes to get storm-redis more "feature complete" overall. On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <[email protected]> wrote: Sounds good. I’ll find out if it builds against 2.x. If so I’ll go that direction. Otherwise I’ll come back with my findings and we can discuss it further. I notice there are jars in the git repo that we obviously can’t import. They look like they might be custom JWPlayer builds of storm-kafka and storm-redis. Morrigan — Do you know if there is any differences there that required custom builds of those components? -Taylor On Sep 26, 2016, at 3:31 PM, Bobby Evans <[email protected] wrote: Does it compile against 2.X? If so I would prefer to have it go there, and then possibly 1.x if people what it there too. - Bobby On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz < [email protected]> wrote: The IP Clearance vote has passed and we are now able to import the SQE code. The question now is to where do we want to import the code? My inclination is to import it to “external” in the 1.x branch. It can be ported to other branches as necessary/if desired. An alternative would be to treat it as a feature branch, but I’d rather take the former approach. Thought/opinions? -Taylor On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <[email protected]> wrote: My apologies. I meant to cc dev@, but didn't. Will forward in a bit... The vote (lazy consensus) is underway on general@incubator, and will close in less than 72 hours. After that the code can be merged. -Taylor On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <[email protected]> wrote: Hi dev, While code contribution of SQE is in progress, I would like to continue discussion how to merge SQE and Storm SQL. I did an analysis of merging SQE and Storm SQL in both side, integrating SQE to Storm SQL and vice versa. https://cwiki.apache.org/confluence/display/STORM/ Technical+analysis+of+merging+SQE+and+Storm+SQL As I commented to that page, since I'm working on Storm SQL for some weeks I can be (heavily) biased. So I'd really appreciated if someone can do another analysis. Please feel free to share your thought about this analysis, or another proposal if you have any, or other things about the merging. Thanks, Jungtaek Lim (HeartSaVioR) -- Morrigan Jones Principal Engineer *JW*PLAYER | Your Way to Play [email protected] | jwplayer.com -- Morrigan Jones Principal Engineer *JW*PLAYER | Your Way to Play [email protected] | jwplayer.com
