Re: CLC, inclusive language, and Apache

Jarek Potiuk Tue, 31 Aug 2021 17:58:33 -0700

I actually only now found time to read the thread (and earlier
announcement) and well, I would love to see how we do at Apache Airflow.


I understand Rich that you've been flooded by some unexpected push-back,
and I can understand why, but for one - I had no time to even look at the
report, so I am a bit sad it is down now and I cannot check it.
I would encourage you to get it back online - with some observations and
maybe a suggestion.

To cheer you up Rich - quite possibly - the amount of the backlash you
received is actually not even close to the positive responses you HAVE NOT
RECEIVED YET. It's far too early to say. Even in far simpler cases Apache
mandates minimum 72 hours and I think it was up for - well about a  day ?
Not nearly enough for people to even realise that there is anything like
the CLC tool out there in most cases!

I think due to the backlash and "perceived" sensitivity of the issue, maybe
a good idea will be to enable it for all projects but only show the data of
a project to committers of that project (I believe it was publicly
available for all projects)?

That might remove one potential reason why people might object and backlash
against it - they are afraid their projects will be compared to others and
will fare "worse", and this will be visible by others (even publicly) and
this might create an artificial pressure on them. And I perfectly
understand that - especially if there are plenty of false-positives (but
even if not, I would not feel well knowing that I am at the bottom of the
list and have no time to fix it (or I don't feel it is wrong at all and my
community is OK with that as well). And you should be ok to ignore that as
well if your community is ok.

Also I think it SHOULD be visible to all committers in the project and they
should be made aware of its existence. No more no less. No pressure, use it
as you want. I think even the PMC members are not aware that some
committers are offended by some language in their project and would not
openly speak about it? And maybe those committers, seeing their project
report will simply ... start fixing those. Even without the agreement of
decisions of those PMC members. If other committers agree with them, those
PMC might even not lead that or in many cases be aware of those changes .
There is no need for that imho. It's not the role of PMC members to make
such decisions, it's the decision of each project community as a whole IMHO
- same as any other project decision. I do not think any PMC member's
opinion matters here in isolation and this should be the matter of the
project's community..

Also I believe that - while you could get a lot of pushback you got, there
might be some PMCs and committers actually interested in the results. HECK
- interested in improving the tool if it shows false-positives!
My absolutely natural reaction when I see such discussion above is "Where
can I submit my fix?". I think running it on all the codebase of all
projects of ASF is GREAT test ground. What sebb mentioned about lots of
false positives is probably right. But my natural instinct and reaction -
being part of the Apache Community - is not "let someone do the testing
first before I get to see my results" but "where can I submit a PR to fix
the false-positives I see in my project".

I WANT to become a tester of that tool - even at the early stage. I WANT To
improve it. I WANT to make it better - especially that I know that
potentially 350 other Apache projects might benefit from implementing a
simple fix.
I would love to be able to do that. IMHO Rich's role here at most would be
to track the progress and improvements and encourage others to help and
gather and present the statistics on false-positives/improvements etc.- and
maybe report them in aggregate form, so that as an organisation we could
see if the tool works and improves over time. But it's OUR role -
committers in those projects to do the job IMHO.

There is only one Rich Bowen in the ASF, and there are hundreds of
committers in hundreds of projects of the ASF. People who know how to code
and submit PRs. I see no reason why the testing should be done by that one
person who has probably a lot of other things and is busy in making the
community work (including the ApacheCon running).

I really think we should bring it back for everyone (visible for committers
only), and ask everyone to help to make the tool better while fixing their
problems along the way.

This is what I see as the Apache Way of doing it and this is my natural
instinct to follow.

J,


On Wed, Sep 1, 2021 at 1:48 AM sebb <seb...@gmail.com> wrote:

> On Wed, 1 Sept 2021 at 00:30, Matt Sicker <boa...@gmail.com> wrote:
> >
> > Security scanners, compilers, linters, etc., are all incredibly noisy
> when first enabled on an existing codebase. I’d expect similarly for tools
> that attempt to lint the English language. If the tool were smart enough to
> avoid false positives, it might also be superintelligent. Remember, naming
> things is one of the hardest parts of programming!
>
> That is not the case here.
> The analogy of a compiler is particularly inappropriate: if a compiler
> reports a problem, it has to be fixed.
>
> The way words are detected currently is bound to cause false positives.
> As I already wrote, the way to handle this is to run the checker on a
> substantial codebase, and look for projects with an abnormal number of
> hits.
> Those are likely to be false positives. Fix those and try again.
>
> Again, if the first run does produce lots of hits, then be more
> conservative in matching for the initial run.
> For example, look for words which are bracketed by white-space and
> perhaps quotes, nothing more.
> If that produces no hits, gradually widen the matching.
>
> There is no point producing an initial analysis with hundreds of hits.
>
> Sorry, but IMO the problem here is insufficient testing.
>
> > Matt Sicker
> >
> > > On Aug 31, 2021, at 17:20, sebb <seb...@gmail.com> wrote:
> > >
> > > On Tue, 31 Aug 2021 at 17:50, Rich Bowen <rbo...@rcbowen.com> wrote:
> > >>
> > >>
> > >>
> > >>> On 8/31/21 12:24 PM, sebb wrote:
> > >>>
> > >>> That seems to me to be an overreaction.
> > >>
> > >> Yes, I can see that it would seem that way without a larger context.
> The
> > >> number of messages I have received on various lists, and off-list,
> > >> calling this effort wrong/bad/evil, have been ... demoralizing, shall
> we
> > >> say?
> > >>
> > >>> In my case, I have no complaints about the purpose of the analysis.
> > >>> It's the excessing false positives and UI of the software that is the
> > >>> problem, combined with a poorly worded email.
> > >>
> > >> I appreciate that you have no complaints about the purpose of the
> > >> analysis. Others do, and have made those complaints both very obvious
> > >> and very personal.
> > >>
> > >> While this is often the case with this conversation, the vitriol this
> > >> time has been somewhat disturbing. And that's from someone who has had
> > >> this conversation with probably 200 projects over the past 18 months.
> > >>
> > >>> I think what needs to happen is for a detailed investigation of the
> > >>> results, especially for projects that have lots of hits, so that the
> > >>> scanning can be properly tuned.
> > >>> It's pretty obvious at present that the scanning is far too eager to
> > >>> report issues (and not just master in URLs).
> > >>
> > >> I disagree.
> > >
> > > Have you actually looked at any of the scans?
> > > In the case of commons-csv, there were over 1800 reports of the use of
> > > 'he' in code.
> > > However these were all parts of a test data file, for example:
> > >
> > > ..|=he|פוזארוואץ|...
> > >
> > > I assume that is he for Hebrew; it should not have been flagged.
> > >
> > >> I think that highlighting all potential problematic
> > >> words/phrases is part of the message, whether or not the project in
> > >> question feels the need to address all of them. The purpose here is to
> > >> make people aware of how the words/phrases in their code and
> > >> documentation affect other people.
> > >>
> > >>> There also needs to be some work on the UI, to make it easier to
> > >>> ignore individual files, and to make it easier to actually edit the
> > >>> source files.
> > >>
> > >> It is not the goal of the tool to make editing source files easy or
> even
> > >> possible. It's a code analysis tool. Sure, it could link to the file
> in
> > >> the target repository, which may be what you're asking for. But it's
> not
> > >> intended to be a remediation tool.
> > >
> > > The easier you make it for projects, the more likely they are to use
> > > it and persuade others to do the same.
> > >
> > > I am only suggesting providing links to the Git repo files.
> > > Assuming that the tool has local checkouts of the repo, that should
> > > not be hard to do.
> > > At present not even the Github source repos listed at the head of the
> > > page are linked, and they are already URLs.
> > >
> > >>> There are some other issue no doubt.
> > >>>
> > >>> Once the reports are usable without lots of effort by projects, then
> > >>> maybe start inviting a few random projects to see if they have any
> > >>> feedback on the analysis.
> > >>> Fix any issues, and gradually increase the number of projects.
> > >>>
> > >>> It might be an idea to send a follow-up email to explain why all the
> > >>> projects have been removed.
> > >>
> > >> One of the things we were reprimanded for was sending a cross-project
> > >> email about this topic in the first place. As such, I won't be
> > >> advocating sending a followup email on the same topic. Someone else
> is,
> > >> of course, welcome to pursue that avenue.
> > >>
> > >>>
> > >>> Though I think it would have been better to keep the projects (apart
> > >>> from retired ones), but send an email to say that the analyses are
> > >>> currently at the alpha stage, and solicit feedback on improving the
> > >>> scanning.
> > >>
> > >> They're *not* at an alpha stage. Those words *do* appear in the code.
> > >> And I'm already using this same tool elsewhere, as part of my day job.
> > >> It's a tool. It wasn't the tool that people objected to. It was the
> > >> analysis.
> > >
> > > I only object to the analysis inasmuch as it provides too many false
> positives.
> > >
> > >>>
> > >>> That way might result in analyses that projects actually want.
> > >>
> > >> My take-away was that the projects *don't* want this analysis.
> > >
> > > That may be true for some; I don't think it is true for all.
> > >
> > > But it will remain so unless the tool is a lot easier to use.
> > >
> > >>
> > >> --
> > >> Rich Bowen - rbo...@rcbowen.com
> > >> @rbowen
>

Re: CLC, inclusive language, and Apache

Reply via email to