I agree!
A good place to start would be to write an extensive JIRA guidelines page,
an example would be
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.
This is quite crisp and clear.
Further, right now the corresponding guidelines for Flink are on the cwiki
page, but the main README links to
http://flink.apache.org/how-to-contribute.html. We should perhaps merge the
cwiki page content here.

Regards
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sun, Aug 16, 2015 at 9:21 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Henry raised the point about non.descriptive bug reports earlier. I would
> like to bring this to everyone's mind again and add some additional
> thoughts:
>
> We are seeing a lot of issues reported right now, and a lot of pull
> requests opened right now, for issues that are not really a problem.
>
> There are many places in the code, where one could write things slightly
> different. Some of these slightly different variations may look slightly
> more efficient at a first glance, but are not anywhere on a hot code path,
> so they actually do not really make any difference.
>
> However, every of those changes introduces the possibility of new bugs.
> Quite a few of the proposed fixes had actually changed the semantics, with
> the result that they would have broken the system instead of improving
> anything.
>
> This has been famously summed up by Donald Knuth in his quote:
>
> "*Premature optimization is the root of all evil"*
>
> Before changing a line of code in the attempt to do one comparison less,
> please check whether the change is actually worth it:
>
>  - Better more checks than fewer checks, if the code path is not hot.
> Catching bugs better / earlier is worth a lot.
>
>  - On modern processors, performance of non-I/O code is almost always
> limited by memory access delays (cache / TLB misses). Arithmetic and checks
> are comparatively cheap, meaning that that optimizing it usually matters
> only in arithmetic loops, or the hottest code paths.
>
>  - Good fixes are still all fixes that address any form of resource leak,
> or forgotten closing of streams, clients, ...
>
>  - Performance critical in Flink's runtime are mainly the Serializer code,
> the hash/sort algorithms, the network/disk code, the driver loops for the
> operators.
>
>   - On the JobManager, the number and dependencies of deployment messages,
> and the complexity of the graph traversal dominate all other computation.
>
>  - Correctness and safety are always more critical than the last 1% of
> performance.
>
>
> This was my personal view on things, please write if you agree or disagree.
>
>
> Greetings,
> Stephan
>

Reply via email to