James, If the extra check is costly, you might also observe that all (most?) existing files have the proper header format. It is only new or changed files that must be checked. So, you can use Git to determine the change set on each PR and do the extra format check only on those files.
- Paul On Mon, Jan 29, 2024 at 7:37 AM James Turton <dz...@apache.org> wrote: > Thank you for these explanations Claude. > > Looking at your second paragraph about the proposal to enhance the code > that inserts headers, a comment start definition for Java files of > '/*\n' (newline after the '/*') should work to accept the Apache license > header in a Java comment but reject it if it's in a Javadoc comment. > That seems promising, and I'll take a look at RAT-330, but I'm also able > to move forward in Drill using alternatives in the interim. > > Regards > James > > > On 2024/01/29 09:33, Claude Warren wrote: > > James, > > > > The in general processing for matching licenses strips out all non > > essential text (e.g. '/' and '*') so the current implementation can not > > determine if the license text is within a javadoc block or not. Some > > matchers (e.g. Copyright, SPDX, and regex) do use the unmodified text but > > they are generally much slower. Infact, the original SPDX and Copyright > > implementations caused a significant (2 order of magnitude or more) > > increase in processing time. It would be possible to create a custom > > matcher to do what you want. But there is no mechanism currently > available > > in the code base to only call a matcher on specific file types. > > > > There is a section of code that understands file types, but this is the > > code that inserts headers into files that don't have them. It may be > > possible to build on that to create a custom matcher to ensure that > license > > comments are not within java docs. There is a ticket open to modify how > > this code works so that new file types with comment start stop > definitions > > and restrictions on first lines and such can be defined outside of the > > codebase, making it possible to insert headers in as yet unrecognized > file > > formats.[1] This might be extended and provide input to the process you > > are requesting. > > > > There is also a section of code that removes the non essential text. The > > 'prune' method could be modified to remove blocks of code between the > > opening javadoc '/**' and the closing '*/'. But this may lead to > problems > > with non java files. Speaking of non java files have you thought about > > ensuring that the license does not appear in other javadoc like systems? > > [2] Once this can of worms is opened we will need a way to manage all > the > > requests that will follow for other file types. > > > > If you have any ideas for implementing the change I would be interested > to > > hear them. > > > > Claude > > > > [1] https://issues.apache.org/jira/browse/RAT-330? > > [2] > > > https://stackoverflow.com/questions/5334531/using-javadoc-for-python-documentation > > > > On Fri, Jan 26, 2024 at 2:38 PM James Turton <dz...@apache.org> wrote: > > > >> Thanks Phil. > >> > >> Here's some background [1] which comes from before I was involved with > >> Drill. What they wanted was for the license header checker to accept, in > >> .java files, > >> > >> /* > >> * Licensed to the Apache Software Foundation (ASF) under one > >> * or more contributor license agreements. See the NOTICE file > >> * distributed with this work for additional information > >> etc. > >> > >> but reject > >> > >> /** > >> * Licensed to the Apache Software Foundation (ASF) under one > >> * or more contributor license agreements. See the NOTICE file > >> * distributed with this work for additional information > >> etc. > >> > >> Notice the two asterisks that open the Java comment block in the second > >> form thereby making it a Javadoc comment that will appear in generated > >> Javadoc. There are no longer any examples of the latter in Drill but > >> this has been enforced by the addition of the license-maven-plugin. > >> > >> I got here because I want to remove that plugin, which essentially > >> duplicates RAT, in favour of another (with exactly the same name :() > >> that can generate license and notice information for our third party > >> code. This last task is what I'm really doing, the Javadoc license > >> header rejection matter is yak shaving that came up on the road. > >> > >> So my yak shaving question is: if I make RAT Drill's only license header > >> checker then could I make it reject license headers of the second form? > >> Even if I can't I'm inclined to make it the only header checker since I > >> think that it's in any case mandatory and authoritative. But in an > >> effort to retain the work of the previous Drill developers I'm trying to > >> preserve what they implemented. > >> > >> 1. https://issues.apache.org/jira/browse/DRILL-6320 > >> > >> On 2024/01/26 14:06, P. Ottlinger wrote: > >>> Hi James, > >>> > >>> thanks for reaching out! > >>> > >>> Am 26.01.24 um 08:21 schrieb James Turton: > >>>> I'd like to ask about a feature to prevent RAT from allowing license > >>>> headers to appear inside Javadoc comments (/**) while still requiring > >>>> them in Java comments (/*) in .java files. Currently the Drill project > >>>> makes use of com.mycila.license-maven-plugin to reject licenses in > >>>> Javadoc comments because the developers at the time didn't want > >>>> license headers cluttering the Javadoc website that is generated from > >>>> the source. Are you aware of a general view on Apache license headers > >>>> appearing in Javadoc pages? If preventing them from doing so is a good > >>>> idea, could this become a (configurable) feature in RAT? > >>> could you be so kind to provide an example of what you want to achieve > >>> and how your use case looks like? > >>> > >>> I'm afraid I do not really understand what you mean with > >>> javadoc-specific licenses? > >>> > >>> At the moment we don't have a file specific parsing to exclude comments > >>> - is that what you want to achieve? > >>> > >>> On the other hand if a license header is needed per file, it has to be > >>> somewhere in the sources ;) > >>> > >>> Thanks, > >>> Phil > > > >