Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Kant Kodali Wed, 12 Oct 2016 01:58:43 -0700

Well 1) I have not sent it to postgresql mailing lists 2) I thought this is
an open ended question as it can involve ideas from everywhere including
the Cassandra java driver mailing lists so sorry If that bothered you for
some reason.


On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <[email protected]>
wrote:

> Also, I'm not sure, but I don't think it's "cool" to write to multiple
> lists in the same message. (based on postgresql mailing lists rules).
> Example I'm not subscribed to those, and now the messages are separated.
>
> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <[email protected]>
> wrote:
>
>> There are some issues working on larger partitions.
>> Hbase doesn't do what you say! You have also to be carefull on hbase not
>> to create large rows! But since they are globally-sorted, you can easily
>> sort between them and create small rows.
>>
>> In my opinion, cassandra people are wrong, in that they say "globally
>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>> most of the time! You have to be careful though (just like with random
>> partition)
>>
>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
>> way.
>> The most "recent", means there's a timestamp in there ?
>>
>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> I understand Cassandra can have a maximum of 2B rows per partition but
>>> in practice some people seem to suggest the magic number is 100K. why not
>>> create another partition/rowkey automatically (whenever we reach a safe
>>> limit that  we consider would be efficient)  with auto increment bigint  as
>>> a suffix appended to the new rowkey? so that the driver can return the new
>>> rowkey  indicating that there is a new partition and so on...Now I
>>> understand this would involve allowing partial row key searches which
>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>>> about token ranges and potentially many other things..
>>>
>>> My current problem is this
>>>
>>> I have a row key followed by bunch of columns (this is not time series
>>> data)
>>> and these columns can grow to any number so since I have 100K limit (or
>>> whatever the number is. say some limit) I want to break the partition into
>>> level/pages
>>>
>>> rowkey1, page1->col1, col2, col3......
>>> rowkey1, page2->col1, col2, col3......
>>>
>>> now say my Cassandra db is populated with data and say my application
>>> just got booted up and I want to most recent value of a certain partition
>>> but I don't know which page it belongs to since my application just got
>>> booted up? how do I solve this in the most efficient that is possible in
>>> Cassandra today? I understand I can create MV, other tables that can hold
>>> some auxiliary data such as number of pages per partition and so on..but
>>> that involves the maintenance cost of that other table which I cannot
>>> afford really because I have MV's, secondary indexes for other good
>>> reasons. so it would be great if someone can explain the best way possible
>>> as of today with Cassandra? By best way I mean is it possible with one
>>> request? If Yes, then how? If not, then what is the next best way to solve
>>> this?
>>>
>>> Thanks,
>>> kant
>>>
>>
>>
>

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Reply via email to