Hi, I have some ideas for MLlib that I think might be of general interest so I'd like to see what people think and maybe find some collaborators.
(1) Some form of Markov chain Monte Carlo such as Gibbs sampling or Metropolis-Hastings. Any kind of Monte Carlo method is readily parallelized so Spark seems like a natural platform for them. MCMC plays an important role in computational implementations of Bayesian inference. (2) A function to compute the calibration of a probabilistic classifier. The question this answers is, if the classifier outputs 0.x for some group of examples, is the actual proportion approximately 0.x ? This is useful to know if the classifier outputs are used to compute expected loss in some decision procedure. Of course (1) is much bigger than (2). Perhaps (2) is a one-person job but (1) will take a lot of teamwork. I am thinking that in the short term, we could at least make some progress on an outline or framework for (1). I am a newcomer to Scala and Spark but I have a lot of experience in statistical computing. I am thinking that maybe one or the other of these projects will be a good way for me to learn more about Spark and make a useful contribution. Thanks for your interest and I look forward to your comments. Robert Dodier --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org