Re: [DISCUSS] Plan for merge SQE and Storm SQL

Jungtaek Lim Thu, 06 Oct 2016 22:25:45 -0700

For preventing this discussion to be staled, I'd like to put a note how
Storm SQL is going now.
(including the change after my tech. analysis)

1. There're 6 pull requests opened regarding Storm SQL.
https://github.com/apache/storm/pulls/HeartSaVioR

2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged to
master, Storm SQL can handle most of things which Calcite publishes to SQL
reference page. I wrote Storm SQL's own reference page
<https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6a314910437/docs/storm-sql-reference.md>
and submitted it to another pull request (one of six pull requests)

STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in
order to apply bugfix. (Yes, someone can state that it's a bit tightly
coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it doesn't
happen in next week, we can just stick with Calcite 1.9.0 and immediately
upgrade 1.10.0 when it's available.

3. During adding tests I found some of bugs on Calcite side (left notes for
each test and also reference doc), which we can contribute back to Calcite
community. I think this would be a positive feedback loop between Storm and
Calcite project.

There're still many rooms available to work on Storm SQL.

1. I haven't had much time to do now, but would like to learn Calcite and
try to optimize Storm SQL via STORM-1446
<https://issues.apache.org/jira/browse/STORM-1446>. If someone who
understands Calcite well takes over and contributes this area I'd be much
appreciated.

2. I think Trident seems not good backend API for Storm SQL for a long
term.
Trident doesn't support typed operation, join and aggregation is limited to
micro-batch so result is non deterministic. I'm waiting for higher-level
API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the
start) and plan to move the backend API.

3. Storm SQL needs to have more connectors as data sources. It isn't hard
to work on, but a bit time-consuming.

4. Storm SQL needs to have time to stabilize by experiment loop:
experimenting from early adopters, reporting, and fixing bugs.

5. Long term work: we can expand supporting features on Storm SQL. Join
between streaming data source and normal table should help to enrich or
filter data with other external data source. There's also continuous effort
from Calcite community: 'Streaming SQL' led by Julian. etc.

Looking forward to continue discussion with JW Player folks.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps.
I'd really like to make revamped Storm SQL available to the community.
Do we think merge process can pause this effort? I hope not, but I can
follow the community's decision.

2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <[email protected]>님이 작성:

FYI, I’ve merged the SQE code and documentation into the sqe_import branch:

https://github.com/apache/storm/tree/sqe_import

Note the build will fail on the last component (storm-sqe) due to the
compilation issues mentioned earlier.

-Taylor

On Sep 29, 2016, at 11:13 AM, Bobby Evans <[email protected]>
wrote:

Agreed, or if we can find a way to not break compatibility (like perhaps
having a common base class for most of the logic and one subclass that uses
a string while another that uses the byte array). - Bobby

On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <[email protected]>
wrote:

The change on storm-redis seems to require breaking backward compatibility,
so I would love to see another pull request to integrate it via general
review process.
If it can be integrated to SQE without changing storm-redis, that would be
nice.

Does it make sense?

- Jungtaek Lim (HeartSaVioR)

2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <[email protected]>님이 작성:

The changes are available in GitHub, I just overlooked it. And they’re
actually neatly contained in two commits:

The changes to storm-kafka can be found here:

https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a

The changes to storm-redis are here:

https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8
<
https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a
>

The changes to storm-kafka are straightforward and implemented in such a
way that they would be useful for use cases outside of SQE. As the commit
message states, it adds a new kafka deserialization scheme (FullScheme)
that includes the key, value, topic, partition and offset when reading from
kafka, which is a feature I can see as being valuable for some use cases. I
would be +1 for merging that code.

The changes to storm-redis are a little different, as Morrigan pointed
out, because it only addresses the Trident API, but IMHO it looks like a
good direction.

@HeartSavior — Would you have some time to take a look at the storm-redis
changes and provide your opinion, since you’re one of the original authors
of that code?

-Taylor

On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <[email protected]> wrote:

Great!

For storm-redis, we might need to modify key/value mapper to use byte[]
rather than String.
When I co-authored storm-redis, I forgot considering binary format of
serde. If we want to address that part, we can also address it.

2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <[email protected]>님이 작성:

Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
will require more work to make it more complete.

On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <[email protected]>
wrote:

Thanks for the explanation Morrigan!

Would you be willing to provide a pull request or patch so the community
can review?

It sounds like at least some of the changes you mention could be useful

the broader community (beyond the SQL effort).

Thanks again,

-Taylor

On Sep 26, 2016, at 4:40 PM, Morrigan Jones <[email protected]>

wrote:

storm-kafka - This is needed because storm-kafka does not provide a

scheme

class that gives you the key, value (payload), partition, and offset.
MessageMetadataScheme.java comes comes closest, but is missing the key.
This was a pretty simple change on my part.

storm-redis - This is needed for proper support of Redis hashes. The
existing storm-redis uses a static string (additionalKey in
the RedisDataTypeDescription class) for the field name in hash types. I
updated it to use a configurable KeyFactory for both the hash name and

the

field name. We also added some limited support for set types. This is
admittedly the messiest between the two jars since we only cared about

the

trident states and would require a lot more changes to get storm-redis

"feature complete" overall.

On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <[email protected]>

wrote:

Sounds good. I’ll find out if it builds against 2.x. If so I’ll go

that

direction. Otherwise I’ll come back with my findings and we can

discuss

further.

I notice there are jars in the git repo that we obviously can’t

import.

They look like they might be custom JWPlayer builds of storm-kafka and
storm-redis.

Morrigan — Do you know if there is any differences there that required
custom builds of those components?

-Taylor

On Sep 26, 2016, at 3:31 PM, Bobby Evans

<[email protected]

wrote:

Does it compile against 2.X? If so I would prefer to have it go

there,

and then possibly 1.x if people what it there too. - Bobby

On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <

[email protected]> wrote:

The IP Clearance vote has passed and we are now able to import the

SQE

code.

The question now is to where do we want to import the code?

My inclination is to import it to “external” in the 1.x branch. It

can

be ported to other branches as necessary/if desired. An alternative

would

be to treat it as a feature branch, but I’d rather take the former

approach.

Thought/opinions?

-Taylor

On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <[email protected]>

wrote:

My apologies. I meant to cc dev@, but didn't. Will forward in a

bit...

The vote (lazy consensus) is underway on general@incubator, and

will

close in less than 72 hours. After that the code can be merged.

-Taylor

On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <[email protected]>

wrote:

Hi dev,

While code contribution of SQE is in progress, I would like to

continue

discussion how to merge SQE and Storm SQL.

I did an analysis of merging SQE and Storm SQL in both side,

integrating

SQE to Storm SQL and vice versa.
https://cwiki.apache.org/confluence/display/STORM/

Technical+analysis+of+merging+SQE+and+Storm+SQL

As I commented to that page, since I'm working on Storm SQL for

some

weeks

I can be (heavily) biased. So I'd really appreciated if someone can

another analysis.

Please feel free to share your thought about this analysis, or

another

proposal if you have any, or other things about the merging.

Thanks,
Jungtaek Lim (HeartSaVioR)

--
Morrigan Jones
Principal Engineer
*JW*PLAYER | Your Way to Play
[email protected] | jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Reply via email to