I just had a look at your proposal. It makes a lot of sense. I still believe that it is a matter of taste if one prefers your or my point of view. Both approaches allows to easily reuse and execute Storm Topologies on Flink (what is the most important feature we need to have).
I hope to get some more feedback from the community, if the Strom-compatibility should be more "stormy" or more "flinky". Bot approaches make sense to me. I view minor comments: * FileSpout vs FiniteFileSpout -> FileSpout was implemented in a Storm way -- to set the "finished" flag here does not make sense from a Storm point of view (there is no such thing as a finite spout) Thus, this example shows how a regular Storm spout can be improved using FiniteSpout interface -- I would keep it as is (even if seems to be unnecessary complicated -- imagine that you don't have the code of FileSpout) * You changed examples to use finite-spouts -- from a testing point of view this makes sense. However, the examples should show how to run an *unmodified* Storm topology in Flink. * we should keep the local copy "unprocessedBolts" when creating a Flink program to allow to re-submit the same topology object twice (or alter it after submission). If you don't make the copy, submitting/translating the topology into a Flink job alters the object (which should not happen). And as it is not performance critical, the copying overhead does not matter. * Why did you change the dop from 4 to 1 WordCountTopology ? We should test in parallel fashion... * Too many reformatting changes ;) You though many classes without any actual code changes. -------- Forwarded Message -------- Subject: Re: Storm Compatibility Date: Fri, 13 Nov 2015 12:15:19 +0100 From: Maximilian Michels <m...@apache.org> To: Matthias J. Sax <mj...@apache.org> CC: Stephan Ewen <se...@apache.org>, Robert Metzger <rmetz...@apache.org> Hi Matthias, Thank you for your remarks. I believe the goal of the compatibility layer should not be to mimic Storm's API but to easily execute Storm typologies using Flink. I see that it is easy for users to use class names for execution they know from Storm but I think this makes the API verbose. I've refactored it a bit to make it more aligned with Flink's execution model. After all, the most important thing is that it makes it easy for people to reuse Storm typologies while getting all the advantages of Flink. Let me explain what I have done so far: https://github.com/apache/flink/compare/master...mxm:storm-dev API - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster, FlinkTopology: They are not necessary in my opinion and are replicating functionality already included in Flink or Storm. - Build the topology with the Storm TopologyBuilder (instead of FlinkTopology) which is then passed to the FlinkTopologyBuilder which generates the StreamExecutionEnvironment containing the StreamGraph. You can then simply call execute() like you would usually do in Flink. This lets you reuse your Storm typologies with the ease of Flink context-based execution mechanism. Note that it works in local and remote execution mode without changing any code. Tests - replaced StormTestBase.java with StreamingTestBase - use a Finite source for the tests and changed it a bit Examples - Convert examples to new API - Remove duplicate examples (local and remote) I hope these changes are not too invasive for you. I think it makes the compatibility layer much easier to use. Let me know what you think about it. Of course, we can iterate on it. About the integration of the compatibility layer into DataStream: Wouldn't it be possible to set storm to provided and let the user include the jar if he/she wants to use the Storm compatibility? That's also what we do for other libraries like Gelly. You have to package them into the JAR if you want to run them on the cluster. We should give a good error message if classes cannot be found. +1 for moving the discussion to the dev list. Cheers, Max On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mj...@apache.org> wrote: > One more thing that just came to my mind about (1): I have to correct my > last reply on it: > > We **cannot reuse** TopologyBuilder because the returned StormTopology > from .createTopology() does **not** contain the references to the > Spout/Bolt object. Internally, those are already serialized into an > internal Thrift representation (as preparation to get sent to Nimbus). > However, in order to create a Flink job, we need the references of course... > > -Matthias > > > On 11/11/2015 04:33 PM, Maximilian Michels wrote: >> Hi Matthias, >> >> Sorry for getting back to you late. I'm very new to Storm but have >> familiarized myself a bit the last days. While looking through the >> Storm examples and the compatibility layer I discovered the following >> issues: >> >> 1) The compatibility layer mirrors the Storm API instead of reusing >> it. Why do we need a FlinkTopologyBuilder, FlinkCluster, >> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes by >> replaced by e.g. StormExecutionEnvironment which receives the Storm >> topology and upon getStreamGraph() just traverses it? >> >> 2) DRPC is not yet supported. I don't know how crucial this is but it >> seems to be widespread Storm feature. If we wrapped the entire Storm >> topology, we could give appropriate errors when we see such >> unsupported features. >> >> 3) We could simplify embedding Spouts and Bolts directly as operator >> functions. Users shouldn't have to worry about extracting the types. >> Perhaps we could implement a dedicated method to add spouts/bolts on >> DataStream? >> >> 5) Performance: The BoltWrapper creates a StormTuple for every >> incoming record. I think this could be improved. Couldn't we use the >> StormTuple as data type instead of Flink's tuples? >> >> 6) Trident Examples. Have you run any? >> >> That's it for now. I'm sure you know about many more improvements or >> problems because you're the expert on this. In the meantime, I'll try >> to contact you via IRC. >> >> Cheers, >> Max >> >> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <mj...@apache.org> wrote: >>> >>> Hi, >>> >>> that sounds great! I am very happy that people are interested in it and >>> start to use it! Can you give some more details about this? I am just >>> aware of a few question at SO. But there was no question about it on the >>> mailing list lately... Did you get some more internal questions/feedback? >>> >>> And of course, other people should get involved as well! There is so >>> much too do -- even if I work 40h a week on it, I cannot get everything >>> done by myself. The last days were very busy for me. I hope I can work >>> on a couple of bugs after the Munich Meetup. I started to look into them >>> already... >>> >>> Should we start a roadmap in the Wiki? This might be helpful if more >>> people get involved. >>> >>> And thanks for keeping me in the loop :) >>> >>> -Matthias >>> >>> >>> On 11/06/2015 03:49 PM, Stephan Ewen wrote: >>>> Hi Matthias! >>>> >>>> We are seeing a lot of people getting very excited about the Storm >>>> Compatibility layer. I expect that quite a few people will seriously >>>> start to work with it. >>>> >>>> I would suggest that we also start getting involved in that. Since you >>>> have of course your priority on your Ph.D., it would be a little much >>>> asked from you to dedicate a lot of time to support more features, be >>>> super responsive with users all the time, etc. >>>> >>>> To that end, some people from us will start testing the API, adding >>>> fixes, etc (which also helps us to understand this better when users ask >>>> questions). >>>> We would definitely like for you to stay involved (we don't want to >>>> hijack this), and help with ideas, especially when it comes to things >>>> like fault tolerance design, etc. >>>> >>>> What do you think? >>>> >>>> Greetings, >>>> Stephan >>>> >>> >
signature.asc
Description: OpenPGP digital signature