Comments inline:

On 20/11/2017, 14:47, "Γεώργιος Δίγκας" <[email protected]> wrote:

    Dear Andy and All,
    
    Thank you very much for the information that you have provided to me. You 
really helped me a lot.
    Below I list some of the most frequent issue types that I have found while 
I was examining the evolution of Jena.
    
    Issue Name      Issues over time        Fixed   Currently Open
    The members of an interface declaration or class should appear in a 
pre-defined order   52875   47530   5345
    Sections of code should not be "commented out"  35465   31905   3560
    Method names should comply with a naming convention     32809   29448   3361
    Constant names should comply with a naming convention   18397   16572   1825
    String literals should not be 
duplicated<https://sbforge.org/sonar/rules/show/squid:S1192?layout=false> 18106 
  16390   1716
    Standard outputs should not be used directly to log 
anything<https://sbforge.org/sonar/rules/show/squid:S106?layout=false>      
16174   14506   1668
    Exception handlers should preserve the original 
exceptions<https://sbforge.org/sonar/rules/show/squid:S1166?layout=false>       
10763   9747    1016
    Source files should not have any duplicated blocks      9141    8078    1063
    <https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>Methods 
should not be 
empty<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>       8653 
   7796    857
    switch case clauses should not have too many lines of 
code<https://sbforge.org/sonar/rules/show/squid:S1151?layout=false>       8280  
  7491    789
    throws declarations should not be 
superfluous<https://sbforge.org/sonar/rules/show/squid:RedundantThrowsDeclarationCheck?layout=false>
  7012    6138    874
    Class variable fields should not have public 
accessibility<https://sbforge.org/sonar/rules/show/squid:ClassVariableVisibilityCheck?layout=false>
        6593    5954    639
    
    
1. The issue that appears the most frequently is: The members of an interface 
declaration or class should appear in a pre-defined order. "According to the 
Java Code Conventions as defined by Oracle, the members of a class or interface 
declaration should appear in the following order in the source files: 1. Class 
and instance variables, 2.Constructors, and 3. Methods."

 This is pretty nitpicky, I have rarely seen this enforced because quite 
frankly it is annoying to enforce and with a modern IDE is somewhat irrelevant 
anyway

2. The second most frequent is: Sections of code should not be "commented out". 
"Programmers should not comment out code as it bloats programs and reduces 
readability.  Unused code should be deleted and can be retrieved from source 
control history if required."

For projects with a long history, Jena started in 2001, this is pretty common. 
Often a programmer will leave a section of code commented out because they 
rewrote that section and can’t verify that they didn’t break something and may 
want to easily go back to the old version should bugs be reported.   Also 
sometimes commented out code is used to illustrate a naïve/simple 
implementation that makes it easier for programmers to understand the intent of 
the code but then the actual implementation may reflect a more performant, but 
less understandable, implementation.

      3.  Method names should comply with a naming convention. "Shared naming 
conventions allow teams to collaborate efficiently. This rule checks that all 
method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
      4.  Constant names should comply with a naming convention. "Shared coding 
conventions allow teams to collaborate efficiently. This rule checks that all 
constant names match the default regular expression 
^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"

These are both personal preference type things. Some projects enforce this 
while others do not, again with a history as long as ours enforcing this is 
undesirable.

 I know that you are using the default here. If you are using the same approach 
to analyse other projects you may be wanting to customise these rules to match 
the actual conventions that a given project uses.

      5.  String literals should not be duplicated. "Duplicated string literals 
make the process of refactoring error-prone, since you must be sure to update 
all occurrences.  On the other hand, constants can be referenced from many 
places, but only need to be updated in a single place."

 I suspect that some of this comes from the fact that Jena consists of many 
modules and not every module depends on every other module so there will 
inevitably be some duplication across modules. Also string literals are used 
heavily in test cases where there will be plenty of intentional duplication 
since each set of tests should be independent of each other.

      6.  Standard outputs should not be used directly to log anything. "When 
logging a message there are several important requirements which must be 
fulfilled:
    * The user must be able to easily retrieve the logs
    * The format of all logged message must be uniform to allow the user to 
easily read the log
    * Logged data must actually be recorded
    * Sensitive data must only be logged securely
    If a program directly writes to the standard outputs, there is absolutely 
no way to comply with those requirements. That's why defining and using a 
dedicated logger is highly recommended."

 We do predominantly use a logging framework. However, we also provide a suite 
of command line tools which must interact with Standard output to be useful to 
the end user.

      7.  Exception handlers should preserve the original exceptions. "When 
handling a caught exception, the original exception's message and stack trace 
should be logged or passed forward."

 While this is an ideal there are many cases where actually you do want to 
suppress errors. Either because the underlying error is not helpful or because 
you are explicitly trying to recover from an error and you don’t want your 
cleanup code to be derailed by further errors. We use this pattern extensively 
in relation to IO, when something goes wrong with IO you want to make sure that 
resource handles are not leaked by closing streams etc but you don’t know in 
advance that the attempts to close might also fail i.e. you want to make a best 
efforts to clean up and the further errors are irrelevant to the original error.

      8.  Source files should not have any duplicated blocks. "An issue is 
created on a file as soon as there is at least one block of duplicated code on 
this file"
      9.  Methods should not be empty. "There are several reasons for a method 
not to have a method body:
    * It is an unintentional omission, and should be fixed to prevent an 
unexpected behavior in production.
    * It is not yet, or never will be, supported. In this case an 
UnsupportedOperationException should be thrown.
    * The method is an intentionally-blank override. In this case a nested 
comment should explain the reason for the blank override."
      10. switch case clauses should not have too many lines of code. "The 
switch statement should be used only to clearly define some new branches in the 
control flow. As soon as a case clause contains too many statements this highly 
decreases the readability of the overall control flow statement. In such case, 
the content of the case clause should be extracted into a dedicated method."
      11.
    "throws" declarations should not be superfluous. "An exception in a throws 
declaration in Java is superfluous if it is:
    * listed multiple times
    * a subclass of another listed exception
    * a RuntimeException, or one of its descendants
    * completely unnecessary because the declared exception type cannot 
actually be thrown"
      12. Class variable fields should not have public accessibility. "Public 
class variable fields do not respect the encapsulation principle and has three 
main disadvantages:
    * Additional behavior such as validation cannot be added.
    * The internal representation is exposed, and cannot be changed afterwards.
    * Member values are subject to change from anywhere in the code and may not 
meet the programmer's assumptions.
    By using private attributes and accessor methods (set and get), 
unauthorized modifications are prevented."

 Again these are mostly stylistic preference and history combined. We try to 
avoid making unnecessary changes to our code base, and with 11 and 12 these 
form part of our public API which we have to take great care in making changes 
to in order to ensure we don’t create problems for our users when they upgrade 
Versions.
    
    I would like to ask you:
    
      *   If you agree/disagree (and why)
      *   Are you aware of those?
      *   Why do you think that they introduced?

Comments with regards to your first three questions are inline above.

I think my general comment is that much of these “issues” are stylistic 
preferences in my personal opinion.  The Oracle Java code style referenced by a 
few of those rules is a specific example of a style, while projects may follow 
some aspects of it each project tends to evolve its own style and may 
incorporate several different styles as it evolves over time. There are other 
base styles but I don’t think I have ever worked on a single project, neither 
open source or commercial, that has wholesale adopted one of those. Even in 
project that strictly enforced styles those styles have been tailored to the 
project in question.

      *   Do you think that all those introduced because Jena does not enforce 
a "squash-before-merging" policy?

This has zero relevance in my opinion. This is primarily down to the long 
history of the code base and the fact that we haven’t ever enforced a specific 
code style. While we try and avoid introducing these issues they are very much 
a minor concern compared to addressing actual functional bug reports.

      *   Are you planing to fix some issues of the above categories?

Probably not. As with all open source projects Jena depends on volunteers to 
donate their time. Volunteers typically choose to spend their time on feature 
and bug work, people are not interested on spending their time on cleaning up 
years of stylistic inconsistency which as I stated are often personal 
preference issues anyway.

Also where things form the public API we make a concerted effort to avoid 
unnecessary changes. And where we do consider the changes necessary we follow a 
deprecation process that means old methods and fields often continue to exist 
for several releases alongside their new equivalents before being removed.

Rob

      *   All the currently open issues are listed on the sheet: Jena: Open 
Issues - October 7, 2017 of 
this<https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing>
 spreadsheet
    
    Thank you in advance!
    
    
    With kind regards,
    
    
    George Digkas
    
    ________________________________
    From: Andy Seaborne <[email protected]>
    Sent: Saturday, November 18, 2017 6:43 PM
    To: Γεώργιος Δίγκας; [email protected]
    Subject: Re: Issues fixed in Apache Jena
    
    Jena was registered at SourceForge 2001-11-20
    
    I found this in ASF SVN:
    
    [[
    Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der
    ]]
    on copyright.txt.
    
    so it looks like we have full history somewhere in the Apache
    infrastructure.
    
    CVS:SF->SVN:SF->SVN:ASF
    
    ASF git does not include the pre-git history.
    
         Andy
    
    On 16/11/17 11:55, Andy Seaborne wrote:
    > Do not take git as complete!
    >
    > Jena started in 2000.
    > https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
    >
    > Jena 2.0 was released 2003-08-28.
    > A whole 40M including dependencies! A 14.7M zip file!
    > https://sourceforge.net/projects/jena/files/
    >
    > The whole of SF SVN history was imported by the Apache infrastructure
    > team (a herculean effort) into Apache SVN. I don't know how to get to it
    > from git, it may not be there and only in SVN.
    >
    > The earliest git root commit is for the move to Apache from SF
    > [4298106f1e], 6 years ago. (There are 4 root commits due to merges)
    
    
    
    
    >
    > ---
    >
    > It's an interesting start and to make the analysis usefully inform the
    > reader as to the state of the project I suggest treating different kinds
    > of issues different, not uniformly important.
    >
    > There are many (, many) minor things and they outweigh the major
    > problems. Calling them all "issues" gives them equal weight. Some are
    > about canonicalization of the code.
    >
    > Yet reformatting the whole code base (if practical, which it arguable)
    > then greatly decreases the usefulness of git history. That would be a
    > huge loss.
    >
    > (NB the "issue" word has a specific meaning for JIRA which a lot of
    > Apache projects use. Jena's current total, now, is 1424.)
    >
    >      Andy
    >
    >>
    >> Thank you in advance!
    >>
    >>
    >> With kind regards,
    >>
    >> George Digkas
    




Reply via email to