Collections that big are likely not what you want. Many people are using
cassandra because they want low latency reads <10ms on smallish row keys or
key slices. Attempting to get 10K + columns in one go generally does not
work well. First, there is network issues 100K columns of 5 bytes requires
large buffers thrift has a max packet size. Thrift aside it is still a lot
of data requiring large buffers etc. However the largest problem is you can
not sub select a collection like you can slice columns from a row key like
thrift.

The lesson being learned the hard way is that collections should not be
very large. This is fairly tricky to accomplish with a blind-write system.
You actually need to do a worst case scenario assessment to figure out how
big your largest entry could be and then chose an approach based on that.
This is counter intuitive because many might think to design around the
normal scenario and leave the worst case scenario to chance. Then you end
up in a spot like this where one row is unreadable/impractical to manage.


On Mon, May 13, 2013 at 2:51 AM, Theo Hultberg <t...@iconara.net> wrote:

> In the CQL3 protocol the sizes of collections are unsigned shorts, so the
> maximum number of elements in a LIST<...> is 65,536. There's no check,
> afaik, that stops you from creating lists that are bigger than that, but
> the protocol doesn't handle returning them (you get the first N - 65536 %
> 65536 items).
>
> On the other hand the JDBC driver doesn't talk over the binary protocol
> but Thrift, doesn't it? In that case there may be other limits.
>
> T#
>
>
> On Mon, May 13, 2013 at 3:26 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
>
>> 2 billion is the maximum theoretically limit of columns under a row. It
>> is NOT the maximum limit of a CQL collection. The design of CQL collections
>> currently require retrieving the entire collection on read.
>>
>>
>> On Sun, May 12, 2013 at 11:13 AM, Robert Wille <rwi...@footnote.com>wrote:
>>
>>> I designed a data model for my data that uses a list of UUID's in a
>>> column. When I designed my data model, my expectation was that most of
>>> the
>>> lists would have fewer than a hundred elements, with a few having several
>>> thousand. I discovered in my data a list that has nearly 400,000 items in
>>> it. When I try to retrieve it, I get the following exception:
>>>
>>> java.lang.IllegalArgumentException: Illegal Capacity: -14594
>>>         at java.util.ArrayList.<init>(ArrayList.java:110)
>>>         at
>>> org.apache.cassandra.cql.jdbc.ListMaker.compose(ListMaker.java:54)
>>>         at
>>> org.apache.cassandra.cql.jdbc.TypedColumn.<init>(TypedColumn.java:68)
>>>         at
>>>
>>> org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResu
>>> ltSet.java:1086)
>>>         at
>>>
>>> org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraR
>>> esultSet.java:161)
>>>         at
>>>
>>> org.apache.cassandra.cql.jdbc.CassandraResultSet.<init>(CassandraResultSet.
>>> java:134)
>>>         at
>>>
>>> org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStateme
>>> nt.java:166)
>>>         at
>>>
>>> org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStat
>>> ement.java:226)
>>>
>>>
>>> I get this with Cassandra 1.2.4 and the latest snapshot of the JDBC
>>> driver. Admittedly, several hundred thousand is quite a lot of items, but
>>> odd that I'm getting some kind of wraparound, since 400,000 is a long
>>> ways
>>> from 2 billion.
>>>
>>> What are the physical and practical limits on the size of a list? Is it
>>> possible to retrieve a range of items from a list?
>>>
>>> Thanks in advance
>>>
>>> Robert
>>>
>>>
>>>
>>
>

Reply via email to