Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Edward Capriolo Wed, 12 Oct 2016 09:11:54 -0700

The "2 billion column limit" press clipping "puffery". This statement
seemingly became popular because highly traffic traffic-ed story, in which
a tech reporter embellished on a statement to make a splashy article.


The effect is something like this:
http://www.healthnewsreview.org/2012/08/iced-tea-kidney-stones-and-the-study-that-never-existed/

Iced tea does not cause kidney stones! Cassandra does not store rows with 2
billion columns! It is just not true.






On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> wrote:

> Well 1) I have not sent it to postgresql mailing lists 2) I thought this
> is an open ended question as it can involve ideas from everywhere including
> the Cassandra java driver mailing lists so sorry If that bothered you for
> some reason.
>
> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.ho...@gmail.com>
> wrote:
>
>> Also, I'm not sure, but I don't think it's "cool" to write to multiple
>> lists in the same message. (based on postgresql mailing lists rules).
>> Example I'm not subscribed to those, and now the messages are separated.
>>
>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <dorian.ho...@gmail.com>
>> wrote:
>>
>>> There are some issues working on larger partitions.
>>> Hbase doesn't do what you say! You have also to be carefull on hbase not
>>> to create large rows! But since they are globally-sorted, you can easily
>>> sort between them and create small rows.
>>>
>>> In my opinion, cassandra people are wrong, in that they say "globally
>>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>>> most of the time! You have to be careful though (just like with random
>>> partition)
>>>
>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
>>> way.
>>> The most "recent", means there's a timestamp in there ?
>>>
>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I understand Cassandra can have a maximum of 2B rows per partition but
>>>> in practice some people seem to suggest the magic number is 100K. why not
>>>> create another partition/rowkey automatically (whenever we reach a safe
>>>> limit that  we consider would be efficient)  with auto increment bigint  as
>>>> a suffix appended to the new rowkey? so that the driver can return the new
>>>> rowkey  indicating that there is a new partition and so on...Now I
>>>> understand this would involve allowing partial row key searches which
>>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>>>> about token ranges and potentially many other things..
>>>>
>>>> My current problem is this
>>>>
>>>> I have a row key followed by bunch of columns (this is not time series
>>>> data)
>>>> and these columns can grow to any number so since I have 100K limit (or
>>>> whatever the number is. say some limit) I want to break the partition into
>>>> level/pages
>>>>
>>>> rowkey1, page1->col1, col2, col3......
>>>> rowkey1, page2->col1, col2, col3......
>>>>
>>>> now say my Cassandra db is populated with data and say my application
>>>> just got booted up and I want to most recent value of a certain partition
>>>> but I don't know which page it belongs to since my application just got
>>>> booted up? how do I solve this in the most efficient that is possible in
>>>> Cassandra today? I understand I can create MV, other tables that can hold
>>>> some auxiliary data such as number of pages per partition and so on..but
>>>> that involves the maintenance cost of that other table which I cannot
>>>> afford really because I have MV's, secondary indexes for other good
>>>> reasons. so it would be great if someone can explain the best way possible
>>>> as of today with Cassandra? By best way I mean is it possible with one
>>>> request? If Yes, then how? If not, then what is the next best way to solve
>>>> this?
>>>>
>>>> Thanks,
>>>> kant
>>>>
>>>
>>>
>>
>

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Reply via email to