+1 from me as well. I'd like to add that there are many relevant issues in listed in JIRA that report bugs or describe interesting / important improvements of the system. Working on one of these issues would be a great contribution to Flink.
Cheers, Fabian 2015-08-16 18:30 GMT+02:00 Sachin Goel <sachingoel0...@gmail.com>: > I agree! > A good place to start would be to write an extensive JIRA guidelines page, > an example would be > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark. > This is quite crisp and clear. > Further, right now the corresponding guidelines for Flink are on the cwiki > page, but the main README links to > http://flink.apache.org/how-to-contribute.html. We should perhaps merge > the > cwiki page content here. > > Regards > Sachin > > -- Sachin Goel > Computer Science, IIT Delhi > m. +91-9871457685 > > On Sun, Aug 16, 2015 at 9:21 PM, Stephan Ewen <se...@apache.org> wrote: > > > Hi all! > > > > Henry raised the point about non.descriptive bug reports earlier. I would > > like to bring this to everyone's mind again and add some additional > > thoughts: > > > > We are seeing a lot of issues reported right now, and a lot of pull > > requests opened right now, for issues that are not really a problem. > > > > There are many places in the code, where one could write things slightly > > different. Some of these slightly different variations may look slightly > > more efficient at a first glance, but are not anywhere on a hot code > path, > > so they actually do not really make any difference. > > > > However, every of those changes introduces the possibility of new bugs. > > Quite a few of the proposed fixes had actually changed the semantics, > with > > the result that they would have broken the system instead of improving > > anything. > > > > This has been famously summed up by Donald Knuth in his quote: > > > > "*Premature optimization is the root of all evil"* > > > > Before changing a line of code in the attempt to do one comparison less, > > please check whether the change is actually worth it: > > > > - Better more checks than fewer checks, if the code path is not hot. > > Catching bugs better / earlier is worth a lot. > > > > - On modern processors, performance of non-I/O code is almost always > > limited by memory access delays (cache / TLB misses). Arithmetic and > checks > > are comparatively cheap, meaning that that optimizing it usually matters > > only in arithmetic loops, or the hottest code paths. > > > > - Good fixes are still all fixes that address any form of resource leak, > > or forgotten closing of streams, clients, ... > > > > - Performance critical in Flink's runtime are mainly the Serializer > code, > > the hash/sort algorithms, the network/disk code, the driver loops for the > > operators. > > > > - On the JobManager, the number and dependencies of deployment > messages, > > and the complexity of the graph traversal dominate all other computation. > > > > - Correctness and safety are always more critical than the last 1% of > > performance. > > > > > > This was my personal view on things, please write if you agree or > disagree. > > > > > > Greetings, > > Stephan > > >