I agree! A good place to start would be to write an extensive JIRA guidelines page, an example would be https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark. This is quite crisp and clear. Further, right now the corresponding guidelines for Flink are on the cwiki page, but the main README links to http://flink.apache.org/how-to-contribute.html. We should perhaps merge the cwiki page content here.
Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Sun, Aug 16, 2015 at 9:21 PM, Stephan Ewen <se...@apache.org> wrote: > Hi all! > > Henry raised the point about non.descriptive bug reports earlier. I would > like to bring this to everyone's mind again and add some additional > thoughts: > > We are seeing a lot of issues reported right now, and a lot of pull > requests opened right now, for issues that are not really a problem. > > There are many places in the code, where one could write things slightly > different. Some of these slightly different variations may look slightly > more efficient at a first glance, but are not anywhere on a hot code path, > so they actually do not really make any difference. > > However, every of those changes introduces the possibility of new bugs. > Quite a few of the proposed fixes had actually changed the semantics, with > the result that they would have broken the system instead of improving > anything. > > This has been famously summed up by Donald Knuth in his quote: > > "*Premature optimization is the root of all evil"* > > Before changing a line of code in the attempt to do one comparison less, > please check whether the change is actually worth it: > > - Better more checks than fewer checks, if the code path is not hot. > Catching bugs better / earlier is worth a lot. > > - On modern processors, performance of non-I/O code is almost always > limited by memory access delays (cache / TLB misses). Arithmetic and checks > are comparatively cheap, meaning that that optimizing it usually matters > only in arithmetic loops, or the hottest code paths. > > - Good fixes are still all fixes that address any form of resource leak, > or forgotten closing of streams, clients, ... > > - Performance critical in Flink's runtime are mainly the Serializer code, > the hash/sort algorithms, the network/disk code, the driver loops for the > operators. > > - On the JobManager, the number and dependencies of deployment messages, > and the complexity of the graph traversal dominate all other computation. > > - Correctness and safety are always more critical than the last 1% of > performance. > > > This was my personal view on things, please write if you agree or disagree. > > > Greetings, > Stephan >