Thanks for the help Jonathan.  Given that the current implementation
isn't optimized for large supercolumns and given that the current
thrift api doesn't support slicing a set of columns across multiple
supercolumns of the same row anyway, I agree that I'd be better off
just folding my supercolumns into separate row keys.

That's actually what my colleague has been doing for his HBase data
model since HBase doesn't have supercolumns; we're currently
evaluating Cassandra and HBase to see which one we should
productionize.

Edmond

On Thu, Oct 22, 2009 at 3:34 PM, Jonathan Ellis <[email protected]> wrote:
> Okay, so the fundamental problem is that deserializing a supercolumn
> with 30k subcolumns is really really slow. (Like we say on
> http://wiki.apache.org/cassandra/CassandraLimitations, "avoid a data
> model that requires large numbers of subcolumns.")
>
> But we were also being needlessly inefficient after deserialization;
> I've attached a patch (against trunk) to
> https://issues.apache.org/jira/browse/CASSANDRA-510.  This gives a
> 30-50% improvement in my tests.
>
> You're looking for more like an order of magnitude improvement though,
> so I would say splitting each supercolumn off into its own row is
> probably the way to go.
>
> -Jonathan
>

Reply via email to