Fwd: Re: Storm Compatibility

Matthias J. Sax Sat, 14 Nov 2015 08:15:06 -0800

I just had a look at your proposal. It makes a lot of sense. I still
believe that it is a matter of taste if one prefers your or my point of
view. Both approaches allows to easily reuse and execute Storm
Topologies on Flink (what is the most important feature we need to have).


I hope to get some more feedback from the community, if the
Strom-compatibility should be more "stormy" or more "flinky". Bot
approaches make sense to me.


I view minor comments:

* FileSpout vs FiniteFileSpout
  -> FileSpout was implemented in a Storm way -- to set the "finished"
flag here does not make sense from a Storm point of view (there is no
such thing as a finite spout)
  Thus, this example shows how a regular Storm spout can be improved
using FiniteSpout interface -- I would keep it as is (even if seems to
be unnecessary complicated -- imagine that you don't have the code of
FileSpout)

* You changed examples to use finite-spouts -- from a testing point of
view this makes sense. However, the examples should show how to run an
*unmodified* Storm topology in Flink.

* we should keep the local copy "unprocessedBolts" when creating a Flink
program to allow to re-submit the same topology object twice (or alter
it after submission). If you don't make the copy, submitting/translating
the topology into a Flink job alters the object (which should not
happen). And as it is not performance critical, the copying overhead
does not matter.

* Why did you change the dop from 4 to 1 WordCountTopology ? We should
test in parallel fashion...

* Too many reformatting changes ;) You though many classes without any
actual code changes.






-------- Forwarded Message --------
Subject: Re: Storm Compatibility
Date: Fri, 13 Nov 2015 12:15:19 +0100
From: Maximilian Michels <m...@apache.org>
To: Matthias J. Sax <mj...@apache.org>
CC: Stephan Ewen <se...@apache.org>, Robert Metzger <rmetz...@apache.org>

Hi Matthias,

Thank you for your remarks.

I believe the goal of the compatibility layer should not be to mimic
Storm's API but to easily execute Storm typologies using Flink. I see
that it is easy for users to use class names for execution they know
from Storm but I think this makes the API verbose. I've refactored it
a bit to make it more aligned with Flink's execution model. After all,
the most important thing is that it makes it easy for people to reuse
Storm typologies while getting all the advantages of Flink.

Let me explain what I have done so far:
https://github.com/apache/flink/compare/master...mxm:storm-dev

API
- remove FlinkClient, FlinkSubmitter, FlinkLocalCluster,
FlinkTopology: They are not necessary in my opinion and are
replicating functionality already included in Flink or Storm.

- Build the topology with the Storm TopologyBuilder (instead of
FlinkTopology) which is then passed to the FlinkTopologyBuilder which
generates the StreamExecutionEnvironment containing the StreamGraph.
You can then simply call execute() like you would usually do in Flink.
This lets you reuse your Storm typologies with the ease of Flink
context-based execution mechanism. Note that it works in local and
remote execution mode without changing any code.

Tests
- replaced StormTestBase.java with StreamingTestBase
- use a Finite source for the tests and changed it a bit

Examples
- Convert examples to new API
- Remove duplicate examples (local and remote)

I hope these changes are not too invasive for you. I think it makes
the compatibility layer much easier to use. Let me know what you think
about it. Of course, we can iterate on it.

About the integration of the compatibility layer into DataStream:
Wouldn't it be possible to set storm to provided and let the user
include the jar if he/she wants to use the Storm compatibility? That's
also what we do for other libraries like Gelly. You have to package
them into the JAR if you want to run them on the cluster. We should
give a good error message if classes cannot be found.

+1 for moving the discussion to the dev list.

Cheers,
Max

On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mj...@apache.org> wrote:
> One more thing that just came to my mind about (1): I have to correct my
> last reply on it:
>
> We **cannot reuse** TopologyBuilder because the returned StormTopology
> from .createTopology() does **not** contain the references to the
> Spout/Bolt object. Internally, those are already serialized into an
> internal Thrift representation (as preparation to get sent to Nimbus).
> However, in order to create a Flink job, we need the references of course...
>
> -Matthias
>
>
> On 11/11/2015 04:33 PM, Maximilian Michels wrote:
>> Hi Matthias,
>>
>> Sorry for getting back to you late. I'm very new to Storm but have
>> familiarized myself a bit the last days. While looking through the
>> Storm examples and the compatibility layer I discovered the following
>> issues:
>>
>> 1) The compatibility layer mirrors the Storm API instead of reusing
>> it. Why do we need a FlinkTopologyBuilder, FlinkCluster,
>> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes by
>> replaced by e.g. StormExecutionEnvironment which receives the Storm
>> topology and upon getStreamGraph() just traverses it?
>>
>> 2) DRPC is not yet supported. I don't know how crucial this is but it
>> seems to be widespread Storm feature. If we wrapped the entire Storm
>> topology, we could give appropriate errors when we see such
>> unsupported features.
>>
>> 3) We could simplify embedding Spouts and Bolts directly as operator
>> functions. Users shouldn't have to worry about extracting the types.
>> Perhaps we could implement a dedicated method to add spouts/bolts on
>> DataStream?
>>
>> 5) Performance: The BoltWrapper creates a StormTuple for every
>> incoming record. I think this could be improved. Couldn't we use the
>> StormTuple as data type instead of Flink's tuples?
>>
>> 6) Trident Examples. Have you run any?
>>
>> That's it for now. I'm sure you know about many more improvements or
>> problems because you're the expert on this. In the meantime, I'll try
>> to contact you via IRC.
>>
>> Cheers,
>> Max
>>
>> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <mj...@apache.org> wrote:
>>>
>>> Hi,
>>>
>>> that sounds great! I am very happy that people are interested in it and
>>> start to use it! Can you give some more details about this? I am just
>>> aware of a few question at SO. But there was no question about it on the
>>> mailing list lately... Did you get some more internal questions/feedback?
>>>
>>> And of course, other people should get involved as well! There is so
>>> much too do -- even if I work 40h a week on it, I cannot get everything
>>> done by myself. The last days were very busy for me. I hope I can work
>>> on a couple of bugs after the Munich Meetup. I started to look into them
>>> already...
>>>
>>> Should we start a roadmap in the Wiki? This might be helpful if more
>>> people get involved.
>>>
>>> And thanks for keeping me in the loop :)
>>>
>>> -Matthias
>>>
>>>
>>> On 11/06/2015 03:49 PM, Stephan Ewen wrote:
>>>> Hi Matthias!
>>>>
>>>> We are seeing a lot of people getting very excited about the Storm
>>>> Compatibility layer. I expect that quite a few people will seriously
>>>> start to work with it.
>>>>
>>>> I would suggest that we also start getting involved in that. Since you
>>>> have of course your priority on your Ph.D., it would be a little much
>>>> asked from you to dedicate a lot of time to support more features, be
>>>> super responsive with users all the time, etc.
>>>>
>>>> To that end, some people from us will start testing the API, adding
>>>> fixes, etc (which also helps us to understand this better when users ask
>>>> questions).
>>>> We would definitely like for you to stay involved (we don't want to
>>>> hijack this), and help with ideas, especially when it comes to things
>>>> like fault tolerance design, etc.
>>>>
>>>> What do you think?
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>
>

signature.asc
Description: OpenPGP digital signature

Fwd: Re: Storm Compatibility

Reply via email to