I started a "users survey" thread over on the users list (replies are still trickling in), but as useful as that is, I'd like to get feedback that is more quantitative and with a broader base. This will let us prioritize our development efforts to better address what people are actually using it for, with less guesswork. For instance: we put a lot of effort into compression for 1.0.0; if it turned out that only 1% of 1.0.x users actually enable compression, then it means that we should spend less effort fine-tuning that moving forward, and use the energy elsewhere.
(Of course it could also mean that we did a terrible job getting the word out about new features and explaining how to use them, but either way, it would be good to know!) I propose adding a basic cluster reporting feature to cassandra.yaml, enabled by default. It would send anonymous information about your cluster to an apache.org VM. Information like, number (but not names) of keyspaces and columnfamilies, ks-level options like compression, cf options like compaction strategy, data types (again, not names) of columns, average row size (or better: the histogram data), and average sstables per read. Thoughts? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com