About DataStream extension and setting storm dependency to provided. If
this works, a big +1 from my side.

-Matthias


On 11/14/2015 05:13 PM, Matthias J. Sax wrote:
> I just had a look at your proposal. It makes a lot of sense. I still
> believe that it is a matter of taste if one prefers your or my point of
> view. Both approaches allows to easily reuse and execute Storm
> Topologies on Flink (what is the most important feature we need to have).
> 
> I hope to get some more feedback from the community, if the
> Strom-compatibility should be more "stormy" or more "flinky". Bot
> approaches make sense to me.
> 
> 
> I view minor comments:
> 
> * FileSpout vs FiniteFileSpout
>   -> FileSpout was implemented in a Storm way -- to set the "finished"
> flag here does not make sense from a Storm point of view (there is no
> such thing as a finite spout)
>   Thus, this example shows how a regular Storm spout can be improved
> using FiniteSpout interface -- I would keep it as is (even if seems to
> be unnecessary complicated -- imagine that you don't have the code of
> FileSpout)
> 
> * You changed examples to use finite-spouts -- from a testing point of
> view this makes sense. However, the examples should show how to run an
> *unmodified* Storm topology in Flink.
> 
> * we should keep the local copy "unprocessedBolts" when creating a Flink
> program to allow to re-submit the same topology object twice (or alter
> it after submission). If you don't make the copy, submitting/translating
> the topology into a Flink job alters the object (which should not
> happen). And as it is not performance critical, the copying overhead
> does not matter.
> 
> * Why did you change the dop from 4 to 1 WordCountTopology ? We should
> test in parallel fashion...
> 
> * Too many reformatting changes ;) You though many classes without any
> actual code changes.
> 
> 
> 
> 
> 
> 
> -------- Forwarded Message --------
> Subject: Re: Storm Compatibility
> Date: Fri, 13 Nov 2015 12:15:19 +0100
> From: Maximilian Michels <m...@apache.org>
> To: Matthias J. Sax <mj...@apache.org>
> CC: Stephan Ewen <se...@apache.org>, Robert Metzger <rmetz...@apache.org>
> 
> Hi Matthias,
> 
> Thank you for your remarks.
> 
> I believe the goal of the compatibility layer should not be to mimic
> Storm's API but to easily execute Storm typologies using Flink. I see
> that it is easy for users to use class names for execution they know
> from Storm but I think this makes the API verbose. I've refactored it
> a bit to make it more aligned with Flink's execution model. After all,
> the most important thing is that it makes it easy for people to reuse
> Storm typologies while getting all the advantages of Flink.
> 
> Let me explain what I have done so far:
> https://github.com/apache/flink/compare/master...mxm:storm-dev
> 
> API
> - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster,
> FlinkTopology: They are not necessary in my opinion and are
> replicating functionality already included in Flink or Storm.
> 
> - Build the topology with the Storm TopologyBuilder (instead of
> FlinkTopology) which is then passed to the FlinkTopologyBuilder which
> generates the StreamExecutionEnvironment containing the StreamGraph.
> You can then simply call execute() like you would usually do in Flink.
> This lets you reuse your Storm typologies with the ease of Flink
> context-based execution mechanism. Note that it works in local and
> remote execution mode without changing any code.
> 
> Tests
> - replaced StormTestBase.java with StreamingTestBase
> - use a Finite source for the tests and changed it a bit
> 
> Examples
> - Convert examples to new API
> - Remove duplicate examples (local and remote)
> 
> I hope these changes are not too invasive for you. I think it makes
> the compatibility layer much easier to use. Let me know what you think
> about it. Of course, we can iterate on it.
> 
> About the integration of the compatibility layer into DataStream:
> Wouldn't it be possible to set storm to provided and let the user
> include the jar if he/she wants to use the Storm compatibility? That's
> also what we do for other libraries like Gelly. You have to package
> them into the JAR if you want to run them on the cluster. We should
> give a good error message if classes cannot be found.
> 
> +1 for moving the discussion to the dev list.
> 
> Cheers,
> Max
> 
> On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mj...@apache.org> wrote:
>> One more thing that just came to my mind about (1): I have to correct my
>> last reply on it:
>>
>> We **cannot reuse** TopologyBuilder because the returned StormTopology
>> from .createTopology() does **not** contain the references to the
>> Spout/Bolt object. Internally, those are already serialized into an
>> internal Thrift representation (as preparation to get sent to Nimbus).
>> However, in order to create a Flink job, we need the references of course...
>>
>> -Matthias
>>
>>
>> On 11/11/2015 04:33 PM, Maximilian Michels wrote:
>>> Hi Matthias,
>>>
>>> Sorry for getting back to you late. I'm very new to Storm but have
>>> familiarized myself a bit the last days. While looking through the
>>> Storm examples and the compatibility layer I discovered the following
>>> issues:
>>>
>>> 1) The compatibility layer mirrors the Storm API instead of reusing
>>> it. Why do we need a FlinkTopologyBuilder, FlinkCluster,
>>> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes by
>>> replaced by e.g. StormExecutionEnvironment which receives the Storm
>>> topology and upon getStreamGraph() just traverses it?
>>>
>>> 2) DRPC is not yet supported. I don't know how crucial this is but it
>>> seems to be widespread Storm feature. If we wrapped the entire Storm
>>> topology, we could give appropriate errors when we see such
>>> unsupported features.
>>>
>>> 3) We could simplify embedding Spouts and Bolts directly as operator
>>> functions. Users shouldn't have to worry about extracting the types.
>>> Perhaps we could implement a dedicated method to add spouts/bolts on
>>> DataStream?
>>>
>>> 5) Performance: The BoltWrapper creates a StormTuple for every
>>> incoming record. I think this could be improved. Couldn't we use the
>>> StormTuple as data type instead of Flink's tuples?
>>>
>>> 6) Trident Examples. Have you run any?
>>>
>>> That's it for now. I'm sure you know about many more improvements or
>>> problems because you're the expert on this. In the meantime, I'll try
>>> to contact you via IRC.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <mj...@apache.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> that sounds great! I am very happy that people are interested in it and
>>>> start to use it! Can you give some more details about this? I am just
>>>> aware of a few question at SO. But there was no question about it on the
>>>> mailing list lately... Did you get some more internal questions/feedback?
>>>>
>>>> And of course, other people should get involved as well! There is so
>>>> much too do -- even if I work 40h a week on it, I cannot get everything
>>>> done by myself. The last days were very busy for me. I hope I can work
>>>> on a couple of bugs after the Munich Meetup. I started to look into them
>>>> already...
>>>>
>>>> Should we start a roadmap in the Wiki? This might be helpful if more
>>>> people get involved.
>>>>
>>>> And thanks for keeping me in the loop :)
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 11/06/2015 03:49 PM, Stephan Ewen wrote:
>>>>> Hi Matthias!
>>>>>
>>>>> We are seeing a lot of people getting very excited about the Storm
>>>>> Compatibility layer. I expect that quite a few people will seriously
>>>>> start to work with it.
>>>>>
>>>>> I would suggest that we also start getting involved in that. Since you
>>>>> have of course your priority on your Ph.D., it would be a little much
>>>>> asked from you to dedicate a lot of time to support more features, be
>>>>> super responsive with users all the time, etc.
>>>>>
>>>>> To that end, some people from us will start testing the API, adding
>>>>> fixes, etc (which also helps us to understand this better when users ask
>>>>> questions).
>>>>> We would definitely like for you to stay involved (we don't want to
>>>>> hijack this), and help with ideas, especially when it comes to things
>>>>> like fault tolerance design, etc.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>>
>>>>
>>
> 
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to