In practice, local secondary indexes scale to {RF * the limit of a single
machine} for -low cardinality- values (ex: users living in a certain state)
since the first node is likely to be able to answer your question. This also
means they are good for performing filtering for analytics.

On the other hand, they are not very useful for high cardinality values (ex:
users born at a particular second), because in the worst case you have to
query every node in your cluster, and you are much more likely to hit the
worst case with rare values.

If you have high cardinality values, it is currently recommended to build
your own secondary indexes from the client side, as you suggested. Triggers
may help you perform this distributed indexing in the near future: see
CASSANDRA-1311.

On Tue, Feb 22, 2011 at 4:45 PM, Piotr J. <pio...@gmail.com> wrote:

> Hi, As far as I understand automatic secondary indexes are generated for
> node local data.
>
> In this case query by secondary index involve all nodes storing part of
> column family to get results (?) so (if i am right) if data is spread across
> 50 nodes then 50 nodes are involved in single query?
>
> How far can this scale? Is this more scalable than manual secondary indexes
> (inverted index column family)? Few nodes or hundred nodes?
>
> Regards
>

Reply via email to