Hi all,
Another gentle ping for help. Probably, let me open up a JIRA and proceed this after a couple of weeks if no one is going to do this although I hope someone takes this. Thanks. 2017-06-18 2:16 GMT+09:00 Sean Owen <so...@cloudera.com>: > Looks like a whole lot of the results have been analyzed. I suspect > there's more than enough to act on already. I think we should wait until > after 2.2 is done. > Anybody prefer how to proceed here -- just open a JIRA to take care of a > batch of related types of issues and go for it? > > On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> Gentle ping to dev for help. I hope this effort is not abandoned. >> >> >> On 25 May 2017 9:41 am, "Josh Rosen" <joshro...@databricks.com> wrote: >> >> I'm interested in using the Scapegoat >> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find >> potential bugs and performance problems in Spark. Scapegoat has a useful >> built-in set of inspections and is pretty easy to extend with custom ones. >> For example, I added an inspection to spot places where we call >> *.apply()* on a Seq which is not an IndexedSeq >> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it >> easier to spot potential O(n^2) performance bugs. >> >> There are lots of false-positives and benign warnings (as with any linter >> / static analyzer) so I don't think it's feasible to us to include this as >> a blocking step in our regular build. I am planning to build tooling to >> surface only new warnings so going forward this can become a useful >> code-review aid. >> >> The current codebase has roughly 1700 warnings that I would like to >> triage and categorize as false-positives or real bugs. I can't do this >> alone, so here's how you can help: >> >> - Visit the Google Docs spreadsheet at https://docs.google.com/ >> spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp= >> sharing >> >> <https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing> >> and >> find an un-triaged warning. >> - In the columns at the right of the sheet, enter your name in the >> appropriate column to mark a warning as a false-positive or as a real bug >> and/or performance issue. If think a warning is a real issue then use the >> "comments" column for providing additional detail. >> - Please don't file JIRAs or PRs for individual warnings; I suspect >> that we'll find clusters of issues which are best fixed in a few larger >> PRs >> vs. lots of smaller ones. Certain warnings are probably simply style >> issues >> so we should discuss those before trying to fix them. >> >> The sheet has hidden columns capturing the Spark revision and Scapegoat >> revision. I can use this to programmatically update the sheet and remap >> lines after updating either Scapegoat (to suppress false-positives) or >> Spark (to incorporate fixes and surface new warnings). For those who are >> interested, the sheet was produced with this script: https://gist.github. >> com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8 >> >> Depending on the results of this experiment we might want to integrate a >> high-signal subset of the Scapegoat warnings into our build. I'm also >> hoping that we'll be able to build a useful corpus of triaged warnings in >> order to help improve Scapegoat itself and eliminate common false-positives. >> >> Thanks and happy bug-hunting, >> Josh Rosen >> >> >>