The "2 billion column limit" press clipping "puffery". This statement seemingly became popular because highly traffic traffic-ed story, in which a tech reporter embellished on a statement to make a splashy article.
The effect is something like this: http://www.healthnewsreview.org/2012/08/iced-tea-kidney-stones-and-the-study-that-never-existed/ Iced tea does not cause kidney stones! Cassandra does not store rows with 2 billion columns! It is just not true. On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> wrote: > Well 1) I have not sent it to postgresql mailing lists 2) I thought this > is an open ended question as it can involve ideas from everywhere including > the Cassandra java driver mailing lists so sorry If that bothered you for > some reason. > > On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.ho...@gmail.com> > wrote: > >> Also, I'm not sure, but I don't think it's "cool" to write to multiple >> lists in the same message. (based on postgresql mailing lists rules). >> Example I'm not subscribed to those, and now the messages are separated. >> >> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <dorian.ho...@gmail.com> >> wrote: >> >>> There are some issues working on larger partitions. >>> Hbase doesn't do what you say! You have also to be carefull on hbase not >>> to create large rows! But since they are globally-sorted, you can easily >>> sort between them and create small rows. >>> >>> In my opinion, cassandra people are wrong, in that they say "globally >>> sorted is the devil!" while all fb/google/etc actually use globally-sorted >>> most of the time! You have to be careful though (just like with random >>> partition) >>> >>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a >>> way. >>> The most "recent", means there's a timestamp in there ? >>> >>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> wrote: >>> >>>> Hi All, >>>> >>>> I understand Cassandra can have a maximum of 2B rows per partition but >>>> in practice some people seem to suggest the magic number is 100K. why not >>>> create another partition/rowkey automatically (whenever we reach a safe >>>> limit that we consider would be efficient) with auto increment bigint as >>>> a suffix appended to the new rowkey? so that the driver can return the new >>>> rowkey indicating that there is a new partition and so on...Now I >>>> understand this would involve allowing partial row key searches which >>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking >>>> about token ranges and potentially many other things.. >>>> >>>> My current problem is this >>>> >>>> I have a row key followed by bunch of columns (this is not time series >>>> data) >>>> and these columns can grow to any number so since I have 100K limit (or >>>> whatever the number is. say some limit) I want to break the partition into >>>> level/pages >>>> >>>> rowkey1, page1->col1, col2, col3...... >>>> rowkey1, page2->col1, col2, col3...... >>>> >>>> now say my Cassandra db is populated with data and say my application >>>> just got booted up and I want to most recent value of a certain partition >>>> but I don't know which page it belongs to since my application just got >>>> booted up? how do I solve this in the most efficient that is possible in >>>> Cassandra today? I understand I can create MV, other tables that can hold >>>> some auxiliary data such as number of pages per partition and so on..but >>>> that involves the maintenance cost of that other table which I cannot >>>> afford really because I have MV's, secondary indexes for other good >>>> reasons. so it would be great if someone can explain the best way possible >>>> as of today with Cassandra? By best way I mean is it possible with one >>>> request? If Yes, then how? If not, then what is the next best way to solve >>>> this? >>>> >>>> Thanks, >>>> kant >>>> >>> >>> >> >