Thank you for these explanations Claude.

Looking at your second paragraph about the proposal to enhance the code that inserts headers, a comment start definition for Java files of '/*\n' (newline after the '/*') should work to accept the Apache license header in a Java comment but reject it if it's in a Javadoc comment. That seems promising, and I'll take a look at RAT-330, but I'm also able to move forward in Drill using alternatives in the interim.

Regards
James


On 2024/01/29 09:33, Claude Warren wrote:
James,

The in general processing for matching licenses strips out all non
essential text (e.g. '/' and '*') so the current implementation can not
determine if the license text is within a javadoc block or not.  Some
matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but
they are generally much slower.  Infact, the original SPDX and Copyright
implementations caused a significant (2 order of magnitude or more)
increase in processing time.  It would be possible to create a custom
matcher to do what you want.  But there is no mechanism currently available
in the code base to only call a matcher on specific file types.

There is a section of code that understands file types, but this is the
code that inserts headers into files that don't have them.  It may be
possible to build on that to create a custom matcher to ensure that license
comments are not within java docs.  There is a ticket open to modify how
this code works so that new file types with comment start stop definitions
and restrictions on first lines and such can be defined outside of the
codebase, making it possible to insert headers in as yet unrecognized file
formats.[1]  This might be extended and provide input to the process you
are requesting.

There is also a section of code that removes the non essential text.  The
'prune' method could be modified to remove blocks of code between the
opening javadoc '/**' and the closing '*/'.  But this may lead to problems
with non java files.  Speaking of non java files have you thought about
ensuring that the license does not appear in other javadoc like systems?
[2]  Once this can of worms is opened we will need a way to manage all the
requests that will follow for other file types.

If you have any ideas for implementing the change I would be interested to
hear them.

Claude

[1] https://issues.apache.org/jira/browse/RAT-330?
[2]
https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation

On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote:

Thanks Phil.

Here's some background [1] which comes from before I was involved with
Drill. What they wanted was for the license header checker to accept, in
.java files,

/*
   * Licensed to the Apache Software Foundation (ASF) under one
   * or more contributor license agreements.  See the NOTICE file
   * distributed with this work for additional information
     etc.

but reject

/**
   * Licensed to the Apache Software Foundation (ASF) under one
   * or more contributor license agreements.  See the NOTICE file
   * distributed with this work for additional information
     etc.

Notice the two asterisks that open the Java comment block in the second
form thereby making it a Javadoc comment that will appear in generated
Javadoc. There are no longer any examples of the latter in Drill but
this has been enforced by the addition of the license-maven-plugin.

I got here because I want to remove that plugin, which essentially
duplicates RAT, in favour of another (with exactly the same name :()
that can generate license and notice information for our third party
code. This last task is what I'm really doing, the Javadoc license
header rejection matter is yak shaving that came up on the road.

So my yak shaving question is: if I make RAT Drill's only license header
checker then could I make it reject license headers of the second form?
Even if I can't I'm inclined to make it the only header checker since I
think that it's in any case mandatory and authoritative. But in an
effort to retain the work of the previous Drill developers I'm trying to
preserve what they implemented.

1. https://issues.apache.org/jira/browse/DRILL-6320

On 2024/01/26 14:06, P. Ottlinger wrote:
Hi James,

thanks for reaching out!

Am 26.01.24 um 08:21 schrieb James Turton:
I'd like to ask about a feature to prevent RAT from allowing license
headers to appear inside Javadoc comments  (/**) while still requiring
them in Java comments (/*) in .java files. Currently the Drill project
makes use of com.mycila.license-maven-plugin to reject licenses in
Javadoc comments because the developers at the time didn't want
license headers cluttering the Javadoc website that is generated from
the source. Are you aware of  a general view on Apache license headers
appearing in Javadoc pages? If preventing them from doing so is a good
idea, could this become a (configurable) feature in RAT?
could you be so kind to provide an example of what you want to achieve
and how your use case looks like?

I'm afraid I do not really understand what you mean with
javadoc-specific licenses?

At the moment we don't have a file specific parsing to exclude comments
- is that what you want to achieve?

On the other hand if a license header is needed per file, it has to be
somewhere in the sources ;)

Thanks,
Phil


Reply via email to