Thanks to both Scott and Matthew for your detailed discussion. I am not 
concerned about the speed of the algorithm as its relatively simple math. The 
conceptual points that are made are worth more investigation but given the 
nature of the problem
it will require a bit of time to think this through and also to run the tweaked 
algorithm on our sample data. So I mark this as triaged. I think its a 
fascinating problem and deserves some more investigation even if the result 
might be that we just keep 
our current approach.

** Changed in: software-center (Ubuntu)
   Importance: Undecided => Medium

** Changed in: software-center (Ubuntu)
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to software-center in Ubuntu.
https://bugs.launchpad.net/bugs/894468

Title:
  Statistics algorithm for sorting ratings looks fishy

Status in “software-center” package in Ubuntu:
  Triaged

Bug description:
  Here's the current code snippet for sorting the Software Center
  Ratings:

  def wilson_score(pos, n, power=0.2):
      if n == 0:
          return 0
      z = pnormaldist(1-power/2)
      phat = 1.0 * pos / n
      return (phat + z*z/(2*n) - z * 
math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)

  def calc_dr(ratings, power=0.1):
      '''Calculate the dampened rating for an app given its collective 
ratings'''
      if not len(ratings) == 5:
          raise AttributeError('ratings argument must be a list of 5 integers')
     
      tot_ratings = 0
      for i in range (0,5):
          tot_ratings = ratings[i] + tot_ratings
        
      sum_scores = 0.0
      for i in range (0,5):
          ws = wilson_score(ratings[i], tot_ratings, power)
          sum_scores = sum_scores + float((i+1)-3) * ws
     
      return sum_scores + 3


  This looks very fishy to me, as we are calculating 5 different wilson
  scores per rating and summing them.  This is slow, and probably wrong.
  I'm not 100% sure about what the right method to use is, however I did
  find the question asked on Math Overflow:

  http://mathoverflow.net/questions/20727/generalizing-the-wilson-score-
  confidence-interval-to-other-distributions

  The current answer there suggests using a standard normal distribution
  for large samples, and a T-distribution for low ones (we don't do
  either)

  This website suggests a slightly different Wilson algorithm:
  http://www.goproblems.com/test/wilson/wilson.php?v1=0&v2=0&v3=3&v4=2&v5=4 

  
  I will go further, and assert that we are making a conceptual error in trying 
to estimate a mean rating in the first place: ratings are fundamentally ordinal 
data, and thus a mean doesn't make much sense for the same reason that 
"excellent" + "terrible" does not balance out to "mediocre".  However, taking 
medians and percentile data is very much valid measurement.

  I will research this question a bit more,  and probably post a
  question on the beta stats stackexchange site for advice.
  Intuitively, though, I think we may want to have a ratings algorithm
  that sorts primarily based on median, and then for the large number of
  cases where two apps have the same median (since we only have 5
  ratings), we then compute a wilson score for the lower bound of the
  probability that a rater of App A would rate >= median vs < median.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/software-center/+bug/894468/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to