If you want to parallelize (a good idea in general) you are best served by doing so across rows rather than across columns.
(Another possibility if you have a relatively static breakdown of columns that makes sense is to spread them across different CFs w/ the same key.) -Jonathan On Mon, Feb 1, 2010 at 7:32 PM, Cagatay Kavukcuoglu <cagatay.kavukcuo...@gmail.com> wrote: > A large column slice in my case is tens of thousands of columns, each > a few K's in size and independent in processing from others. My plan > was to read slices of a few hundred to a thousand columns and process > them in a pipeline for reduced overall latency. Regardless of my > specific case, though, I thought one of the best ways to get good > performance scaling in Cassandra was to distribute reads and writes to > multiple nodes. Are there situations where that's not a good idea? > > CK. > > On Mon, Feb 1, 2010 at 6:00 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> No. Why do you want to do multiple parallel reads instead of one >> sequential read? >> >> On Mon, Feb 1, 2010 at 4:45 PM, Cagatay Kavukcuoglu >> <caga...@kavukcuoglu.org> wrote: >>> Hi, >>> >>> What's the recommended way to do parallel reads of a large slice of >>> columns when one doesn't know enough about the column names to divide >>> them for parallel reading in a meaningful way? SliceRange allows >>> setting the start and finish column names, but you wouldn't be able to >>> set the start field of the next read until the previous read >>> completed. An offset field for the SliceRange would have worked, but I >>> don't see it. Is there a way to divide the big read query into >>> multiple *parallel* small read queries without requiring advance >>> knowledge of the column names? >>> >>> CK. >>> >> >