I haven't read this in detail and I expect that it mostly doesn't apply to our needs, but it might still have good intuitions. The basic topic is certainly trenchant.
http://arxiv.org/abs/1006.2183 and a related article: http://www.deepdyve.com/lp/association-for-computing-machinery/parallel-sparse-matrix-vector-and-matrix-transpose-vector-01pjkq6QwF
