Re: [Haskell-community] 2018 state of Haskell survey results

Gershom B Sun, 18 Nov 2018 09:52:35 -0800

Hi Taylor. I think we're closer to the real results here, but I'm
still pretty sure that there are a fair number of phony responses. In
particular, looking at your filter function, I don't think that _all_
bogus responses said "I dislike it" with regards to the ghc release
schedule. A fair number that hit all the other criteria also seem to
have left it blank. I suspect this will be enough to do the trick, but
can't be sure...


This attempted sabotage of the survey is really frustrating and disappointing.

-g
On Sun, Nov 18, 2018 at 10:58 AM Taylor Fausak <tay...@fausak.me> wrote:
>
> I have filtered out the bogus responses and re-generated all the charts and 
> tables. You can see the updated results here: 
> https://github.com/tfausak/tfausak.github.io/blob/ee29da5bd8389c19763ac2b4dbe27ff5204161f5/_posts/2018-11-16-2018-state-of-haskell-survey-results.markdown
>
> Note that until I post the results on my blog, they are not published. Please 
> don't share the preliminary results on social media!
>
>
> On Sun, Nov 18, 2018, at 8:11 AM, Taylor Fausak wrote:
>
> Thanks for finding those anomalies, Gershom! I'm disappointed that someone 
> submitted bogus responses, apparently to tip the scales of Cabal versus 
> Stack. I intend to identify those responses and exclude them from the 
> results. The work you've done so far will help a great deal in finding them.
>
> You said that there are about 1,200 responses with demographic information. 
> That makes sense considering the number of submissions I got last year. Also, 
> there are 1,185 responses that included an answer to at least one of the 
> free-response questions. So perhaps whoever wrote the script didn't bother to 
> put an answer for those types of questions.
>
> Unfortunately I do not have precise submission times or IP address 
> information about submissions. Beyond what's in the CSV, the only other thing 
> I have is (some) email addresses.
>
> Fortunately I wrote a script to output all the charts and tables from the 
> survey responses. Once I've identified the problematic responses, I should be 
> able to update the script to ignore them and regenerate all the output.
>
>
> On Sun, Nov 18, 2018, at 3:40 AM, Chris Smith wrote:
>
> Sadly, it looks like a Cabal/Stack thing.  Of the responses with a country 
> provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. 
> Of the responses with no country, only 35 of 3868 claim to use Cabal, while 
> 3781 of the 3868 claim to use Stack.  Assuming independence, you'd expect 
> that last number to be about 50, meaning there are probably around 3700 fake 
> responses generated just to answer "Stack".
>
> To partially answer Simon's question, the flood of no-demographics responses 
> started on November 2, around the 750-response point, and continued unabated 
> through the close of the survey.  And, indeed, looking at just the first 750 
> responses gives similar distributions to what we get by ignoring the 
> no-demographic responses.  For example, of the first 750 responses, 359 claim 
> to use Cabal, and 568 claim to use Stack.
>
> On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow <marlo...@gmail.com> wrote:
>
> Good spot Gershom. Maybe it would be revealing to look at the times that 
> responses were received for the no-demographics group?
>
> On Sun, 18 Nov 2018, 07:17 Gershom B <gersh...@gmail.com wrote:
>
> I also noticed a number of other bizarre statistical anomolies when looking 
> at the full results. I know this is a bit much to ask — but if you could 
> rerun the statistics filtering out people that did not give demographic 
> information (i.e. country of origin or education, etc) I think the results 
> will change drastically. By all statistical logic, this should _not_ be the 
> case, and points to a serious problem.
>
> In particular, this drops the results by a huge amount — only 1,200 or so 
> remain. However, the remaining results tend to make a lot more sense. For 
> example — of the “no demographics” group, there are 713 users who claim to 
> develop with notepad++ but all of these say they develop on mac and linux, 
> and none on windows — which is impossible, as notepad++ is a windows program. 
> Further if you drop the “no demographics” group, then you find that almost 
> everyone uses at least ghc 8.0.2, while in the “no demographics” group,  a 
> stunning number of people claim to be on 7.8.3. Even more bizarrely, people 
> claim to be using the 7.8 series while only having used Haskell for less than 
> one year. And people claim to have used haskell for “one week to one month” 
> and also to be advanced and expert users!
>
> The differences continue and defy all probability. Of the “no demographics” 
> group, almost everyone dislikes the new release schedule. Of the 
> “demographics” group there are answers that like it, were not aware of it, or 
> are indifferent, but almost nobody dislikes it. There is naturally a 
> difference in proportions of cabal/stack and hackage/stackage responses as 
> well.
>
> There are a lot of other things I could point to as well. But, bluntly put, I 
> think that some disaffected party or parties wrote a crude script and 
> submitted over 3,000 fake responses. Luckily for us, they were not very 
> smart, and made some obvious errors, so in this case we can weed out the bad 
> responses (although, sadly, losing at least a few real ones as well).
>
> However, assuming  this party isn’t entirely stupid, it doesn’t bode well for 
> future surveys as they may get at least slightly less dumb in the future if 
> they decide to keep it up :-/
>
> —Gershom
>
>
>
> On November 18, 2018 at 1:10:31 AM, Gershom B (gersh...@gmail.com) wrote:
>
>
>
> This is interesting, but I’m thoroughly confused. Over 2500 people said they 
> took last year’s survey, but it only had roughly 1,300 respondants?
>
>
> On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak <tay...@fausak.me> wrote:
>
> Hello! It took a little longer than I expected, but I am nearly ready to 
> announce the 2018 state of Haskell survey results. Some community members 
> have expressed interest in seeing the announcement post before it's 
> published. If you are one of those people, you can see the results here: 
> https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b585c8d0215ff360/_posts/2018-11-16-2018-state-of-haskell-survey-results.markdown
>
> If you would like to suggest changes to the announcement post, please respond 
> to this email, send me an email directly, or reply to this pull request on 
> GitHub: https://github.com/tfausak/tfausak.github.io/pull/148
>
> I plan on publishing the results tomorrow. Once the results are published, 
> the post is by no means set in stone. I will happily accept suggestions from 
> anyone at any time.
>
> Thank you!
> _______________________________________________
> Haskell-community mailing list
> Haskell-community@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
>
>
>
> _______________________________________________
> Haskell-community mailing list
> Haskell-community@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community
_______________________________________________
Haskell-community mailing list
Haskell-community@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Re: [Haskell-community] 2018 state of Haskell survey results

Reply via email to