Looks like a whole lot of the results have been analyzed. I suspect there's
more than enough to act on already. I think we should wait until after 2.2
is done.
Anybody prefer how to proceed here -- just open a JIRA to take care of a
batch of related types of issues and go for it?

On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Gentle ping to dev for help. I hope this effort is not abandoned.
>
>
> On 25 May 2017 9:41 am, "Josh Rosen" <joshro...@databricks.com> wrote:
>
> I'm interested in using the Scapegoat
> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find
> potential bugs and performance problems in Spark. Scapegoat has a useful
> built-in set of inspections and is pretty easy to extend with custom ones.
> For example, I added an inspection to spot places where we call *.apply()* on
> a Seq which is not an IndexedSeq
> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it
> easier to spot potential O(n^2) performance bugs.
>
> There are lots of false-positives and benign warnings (as with any linter
> / static analyzer) so I don't think it's feasible to us to include this as
> a blocking step in our regular build. I am planning to build tooling to
> surface only new warnings so going forward this can become a useful
> code-review aid.
>
> The current codebase has roughly 1700 warnings that I would like to triage
> and categorize as false-positives or real bugs. I can't do this alone, so
> here's how you can help:
>
>    - Visit the Google Docs spreadsheet at
>    
> https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing
>  and
>    find an un-triaged warning.
>    - In the columns at the right of the sheet, enter your name in the
>    appropriate column to mark a warning as a false-positive or as a real bug
>    and/or performance issue. If think a warning is a real issue then use the
>    "comments" column for providing additional detail.
>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>    that we'll find clusters of issues which are best fixed in a few larger PRs
>    vs. lots of smaller ones. Certain warnings are probably simply style issues
>    so we should discuss those before trying to fix them.
>
> The sheet has hidden columns capturing the Spark revision and Scapegoat
> revision. I can use this to programmatically update the sheet and remap
> lines after updating either Scapegoat (to suppress false-positives) or
> Spark (to incorporate fixes and surface new warnings). For those who are
> interested, the sheet was produced with this script:
> https://gist.github.com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8
>
> Depending on the results of this experiment we might want to integrate a
> high-signal subset of the Scapegoat warnings into our build. I'm also
> hoping that we'll be able to build a useful corpus of triaged warnings in
> order to help improve Scapegoat itself and eliminate common false-positives.
>
> Thanks and happy bug-hunting,
> Josh Rosen
>
>
>

Reply via email to