Comments inline:
On 20/11/2017, 14:47, "Γεώργιος Δίγκας" <[email protected]> wrote:
Dear Andy and All,
Thank you very much for the information that you have provided to me. You
really helped me a lot.
Below I list some of the most frequent issue types that I have found while
I was examining the evolution of Jena.
Issue Name Issues over time Fixed Currently Open
The members of an interface declaration or class should appear in a
pre-defined order 52875 47530 5345
Sections of code should not be "commented out" 35465 31905 3560
Method names should comply with a naming convention 32809 29448 3361
Constant names should comply with a naming convention 18397 16572 1825
String literals should not be
duplicated<https://sbforge.org/sonar/rules/show/squid:S1192?layout=false> 18106
16390 1716
Standard outputs should not be used directly to log
anything<https://sbforge.org/sonar/rules/show/squid:S106?layout=false>
16174 14506 1668
Exception handlers should preserve the original
exceptions<https://sbforge.org/sonar/rules/show/squid:S1166?layout=false>
10763 9747 1016
Source files should not have any duplicated blocks 9141 8078 1063
<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>Methods
should not be
empty<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false> 8653
7796 857
switch case clauses should not have too many lines of
code<https://sbforge.org/sonar/rules/show/squid:S1151?layout=false> 8280
7491 789
throws declarations should not be
superfluous<https://sbforge.org/sonar/rules/show/squid:RedundantThrowsDeclarationCheck?layout=false>
7012 6138 874
Class variable fields should not have public
accessibility<https://sbforge.org/sonar/rules/show/squid:ClassVariableVisibilityCheck?layout=false>
6593 5954 639
1. The issue that appears the most frequently is: The members of an interface
declaration or class should appear in a pre-defined order. "According to the
Java Code Conventions as defined by Oracle, the members of a class or interface
declaration should appear in the following order in the source files: 1. Class
and instance variables, 2.Constructors, and 3. Methods."
This is pretty nitpicky, I have rarely seen this enforced because quite
frankly it is annoying to enforce and with a modern IDE is somewhat irrelevant
anyway
2. The second most frequent is: Sections of code should not be "commented out".
"Programmers should not comment out code as it bloats programs and reduces
readability. Unused code should be deleted and can be retrieved from source
control history if required."
For projects with a long history, Jena started in 2001, this is pretty common.
Often a programmer will leave a section of code commented out because they
rewrote that section and can’t verify that they didn’t break something and may
want to easily go back to the old version should bugs be reported. Also
sometimes commented out code is used to illustrate a naïve/simple
implementation that makes it easier for programmers to understand the intent of
the code but then the actual implementation may reflect a more performant, but
less understandable, implementation.
3. Method names should comply with a naming convention. "Shared naming
conventions allow teams to collaborate efficiently. This rule checks that all
method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
4. Constant names should comply with a naming convention. "Shared coding
conventions allow teams to collaborate efficiently. This rule checks that all
constant names match the default regular expression
^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"
These are both personal preference type things. Some projects enforce this
while others do not, again with a history as long as ours enforcing this is
undesirable.
I know that you are using the default here. If you are using the same approach
to analyse other projects you may be wanting to customise these rules to match
the actual conventions that a given project uses.
5. String literals should not be duplicated. "Duplicated string literals
make the process of refactoring error-prone, since you must be sure to update
all occurrences. On the other hand, constants can be referenced from many
places, but only need to be updated in a single place."
I suspect that some of this comes from the fact that Jena consists of many
modules and not every module depends on every other module so there will
inevitably be some duplication across modules. Also string literals are used
heavily in test cases where there will be plenty of intentional duplication
since each set of tests should be independent of each other.
6. Standard outputs should not be used directly to log anything. "When
logging a message there are several important requirements which must be
fulfilled:
* The user must be able to easily retrieve the logs
* The format of all logged message must be uniform to allow the user to
easily read the log
* Logged data must actually be recorded
* Sensitive data must only be logged securely
If a program directly writes to the standard outputs, there is absolutely
no way to comply with those requirements. That's why defining and using a
dedicated logger is highly recommended."
We do predominantly use a logging framework. However, we also provide a suite
of command line tools which must interact with Standard output to be useful to
the end user.
7. Exception handlers should preserve the original exceptions. "When
handling a caught exception, the original exception's message and stack trace
should be logged or passed forward."
While this is an ideal there are many cases where actually you do want to
suppress errors. Either because the underlying error is not helpful or because
you are explicitly trying to recover from an error and you don’t want your
cleanup code to be derailed by further errors. We use this pattern extensively
in relation to IO, when something goes wrong with IO you want to make sure that
resource handles are not leaked by closing streams etc but you don’t know in
advance that the attempts to close might also fail i.e. you want to make a best
efforts to clean up and the further errors are irrelevant to the original error.
8. Source files should not have any duplicated blocks. "An issue is
created on a file as soon as there is at least one block of duplicated code on
this file"
9. Methods should not be empty. "There are several reasons for a method
not to have a method body:
* It is an unintentional omission, and should be fixed to prevent an
unexpected behavior in production.
* It is not yet, or never will be, supported. In this case an
UnsupportedOperationException should be thrown.
* The method is an intentionally-blank override. In this case a nested
comment should explain the reason for the blank override."
10. switch case clauses should not have too many lines of code. "The
switch statement should be used only to clearly define some new branches in the
control flow. As soon as a case clause contains too many statements this highly
decreases the readability of the overall control flow statement. In such case,
the content of the case clause should be extracted into a dedicated method."
11.
"throws" declarations should not be superfluous. "An exception in a throws
declaration in Java is superfluous if it is:
* listed multiple times
* a subclass of another listed exception
* a RuntimeException, or one of its descendants
* completely unnecessary because the declared exception type cannot
actually be thrown"
12. Class variable fields should not have public accessibility. "Public
class variable fields do not respect the encapsulation principle and has three
main disadvantages:
* Additional behavior such as validation cannot be added.
* The internal representation is exposed, and cannot be changed afterwards.
* Member values are subject to change from anywhere in the code and may not
meet the programmer's assumptions.
By using private attributes and accessor methods (set and get),
unauthorized modifications are prevented."
Again these are mostly stylistic preference and history combined. We try to
avoid making unnecessary changes to our code base, and with 11 and 12 these
form part of our public API which we have to take great care in making changes
to in order to ensure we don’t create problems for our users when they upgrade
Versions.
I would like to ask you:
* If you agree/disagree (and why)
* Are you aware of those?
* Why do you think that they introduced?
Comments with regards to your first three questions are inline above.
I think my general comment is that much of these “issues” are stylistic
preferences in my personal opinion. The Oracle Java code style referenced by a
few of those rules is a specific example of a style, while projects may follow
some aspects of it each project tends to evolve its own style and may
incorporate several different styles as it evolves over time. There are other
base styles but I don’t think I have ever worked on a single project, neither
open source or commercial, that has wholesale adopted one of those. Even in
project that strictly enforced styles those styles have been tailored to the
project in question.
* Do you think that all those introduced because Jena does not enforce
a "squash-before-merging" policy?
This has zero relevance in my opinion. This is primarily down to the long
history of the code base and the fact that we haven’t ever enforced a specific
code style. While we try and avoid introducing these issues they are very much
a minor concern compared to addressing actual functional bug reports.
* Are you planing to fix some issues of the above categories?
Probably not. As with all open source projects Jena depends on volunteers to
donate their time. Volunteers typically choose to spend their time on feature
and bug work, people are not interested on spending their time on cleaning up
years of stylistic inconsistency which as I stated are often personal
preference issues anyway.
Also where things form the public API we make a concerted effort to avoid
unnecessary changes. And where we do consider the changes necessary we follow a
deprecation process that means old methods and fields often continue to exist
for several releases alongside their new equivalents before being removed.
Rob
* All the currently open issues are listed on the sheet: Jena: Open
Issues - October 7, 2017 of
this<https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing>
spreadsheet
Thank you in advance!
With kind regards,
George Digkas
________________________________
From: Andy Seaborne <[email protected]>
Sent: Saturday, November 18, 2017 6:43 PM
To: Γεώργιος Δίγκας; [email protected]
Subject: Re: Issues fixed in Apache Jena
Jena was registered at SourceForge 2001-11-20
I found this in ASF SVN:
[[
Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der
]]
on copyright.txt.
so it looks like we have full history somewhere in the Apache
infrastructure.
CVS:SF->SVN:SF->SVN:ASF
ASF git does not include the pre-git history.
Andy
On 16/11/17 11:55, Andy Seaborne wrote:
> Do not take git as complete!
>
> Jena started in 2000.
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
>
> Jena 2.0 was released 2003-08-28.
> A whole 40M including dependencies! A 14.7M zip file!
> https://sourceforge.net/projects/jena/files/
>
> The whole of SF SVN history was imported by the Apache infrastructure
> team (a herculean effort) into Apache SVN. I don't know how to get to it
> from git, it may not be there and only in SVN.
>
> The earliest git root commit is for the move to Apache from SF
> [4298106f1e], 6 years ago. (There are 4 root commits due to merges)
>
> ---
>
> It's an interesting start and to make the analysis usefully inform the
> reader as to the state of the project I suggest treating different kinds
> of issues different, not uniformly important.
>
> There are many (, many) minor things and they outweigh the major
> problems. Calling them all "issues" gives them equal weight. Some are
> about canonicalization of the code.
>
> Yet reformatting the whole code base (if practical, which it arguable)
> then greatly decreases the usefulness of git history. That would be a
> huge loss.
>
> (NB the "issue" word has a specific meaning for JIRA which a lot of
> Apache projects use. Jena's current total, now, is 1424.)
>
> Andy
>
>>
>> Thank you in advance!
>>
>>
>> With kind regards,
>>
>> George Digkas