Sounds like you want https://issues.apache.org/jira/browse/CASSANDRA-2045
On Tue, Apr 26, 2011 at 8:38 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > So maybe this idea has been sent around before but I would like to > know what everyone thinks. We have a huge column family called bigdata > let's say 200 gb a node. We have used cass* as you would expect we > never read before writing and during our bulk loading we can get rates > like 2000 inserts per second per node. This morning I noticed this cf > on only some nodes had a lot of reads which went on for hours. > > Since our apps should not have been reading I dove in. What was > happening was a node was down during the bulk load period. As a resukt > when it came alive the other node with hints went to deliver them. The > problem was the other node was high io trying to deliver hints. I see > why. > > Cassandra does NOT write before read EXCEPT when writing a handoff. > > This is not a good thing. It means the bigger big data cf gets the > more intensive delivering the hint will be on the sender side. Write > rate may be 2000 but they can not be read that fast. > > I know you can now drop and throttle hh in 0.7.0 but this is not good > enough since this only takes longer to get consistent. Or you never > get consistent so here is my thinking... > > Store hints in separate physical files and or possibly deliver those > file by streaming. > > Maybe there is already a jira out there on this. I just work up so to > me it is an original idea :) > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com