Thank you DuyHai. I was in two minds about large partitions for my app. I thought upgrading to 3.x would be good and easy option. But now I'm going to work on refactoring my data model :)
2016-10-15 20:38 GMT+09:00 DuyHai Doan <doanduy...@gmail.com>: > Yes, more or less. The 100Mb is a rule of thumb. No one will blame you for > storing 200Mb for example. The figure is just given as an example of order > of magnitude > > On Sat, Oct 15, 2016 at 1:37 PM, Kant Kodali <k...@peernova.com> wrote: > >> you mean 100MB (MegaBytes)? Also the data in each of my column is about >> 1KB so in that case the optimal size 100K columns (since 100K * 1KB = >> 100MB) right? >> >> On Sat, Oct 15, 2016 at 4:26 AM, DuyHai Doan <doanduy...@gmail.com> >> wrote: >> >>> "2) so what is optimal limit in terms of data size?" >>> >>> --> Usual recommendations for Cassandra 2.1 are: >>> >>> a. max 100Mb per partition size >>> b. or up to 10 000 000 physical columns for a partition (including >>> clustering columns etc ...) >>> >>> Recently, with the work of Robert Stupp (CASSANDRA-11206) and also with >>> the huge enhancement from Michael Kjellman (CASSANDRA-9754) it will be >>> easier to handle huge partition in memory, especially with a reduce memory >>> footprint with regards to the JVM heap. >>> >>> However, as long as we don't have repair and streaming processes that >>> can be "resumed" in a middle of a partition, the operational pains will >>> still be there. Same for compaction >>> >>> >>> >>> On Sat, Oct 15, 2016 at 12:00 PM, Kant Kodali <k...@peernova.com> wrote: >>> >>>> 1) It will be great if someone can confirm that there is no limit >>>> 2) so what is optimal limit in terms of data size? >>>> >>>> Finally, Thanks a lot for pointing out all the operational issues! >>>> >>>> On Sat, Oct 15, 2016 at 2:39 AM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> "But is there still 2B columns limit on the Cassandra code?" >>>>> >>>>> --> I remember some one the committer saying that this 2B columns >>>>> limitation comes from the Thrift era where you're limited to max 2B >>>>> columns to be returned to the client for each request. It also applies to >>>>> the max size of each "page" of data >>>>> >>>>> Since the introduction of the binary protocol and the paging feature, >>>>> this limitation does not make sense anymore. >>>>> >>>>> By the way, if your partition is too wide, you'll face other >>>>> operational issues way before reaching the 2B columns limit: >>>>> >>>>> - compaction taking looooong time --> heap pressure --> long GC pauses >>>>> --> nodes flapping >>>>> - repair & over-streaming, repair session failure in the middle that >>>>> forces you to re-send the whole big partition --> the receiving node has a >>>>> bunch of duplicate data --> pressure on compaction >>>>> - bootstrapping of new nodes --> failure to stream a partition in the >>>>> middle will force to re-send the whole partition from the beginning again >>>>> --> >>>>> the receiving node has a bunch of duplicate data --> pressure on >>>>> compaction >>>>> >>>>> >>>>> >>>>> On Sat, Oct 15, 2016 at 9:15 AM, Kant Kodali <k...@peernova.com> >>>>> wrote: >>>>> >>>>>> compacting 10 sstables each of them have a 15GB partition in what >>>>>> duration? >>>>>> >>>>>> On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Please forget the part in my sentence. >>>>>>> For more correctly, maybe I should have said like "He could compact >>>>>>> 10 sstables each of them have a 15GB partition". >>>>>>> What I wanted to say is we can store much more rows(and columns) in >>>>>>> a partition than before 3.6. >>>>>>> >>>>>>> 2016-10-15 15:34 GMT+09:00 Kant Kodali <k...@peernova.com>: >>>>>>> >>>>>>>> "Robert said he could treat safely 10 15GB partitions at his >>>>>>>> presentation" This sounds like there is there is a row limit too >>>>>>>> not only columns?? >>>>>>>> >>>>>>>> If I am reading this correctly 10 15GB partitions means 10 >>>>>>>> partitions (like 10 row keys, thats too small) with each partition of >>>>>>>> size >>>>>>>> 15GB. (thats like 15 million columns where each column can have a data >>>>>>>> of >>>>>>>> size 1KB). >>>>>>>> >>>>>>>> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <k...@peernova.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> "Robert said he could treat safely 10 15GB partitions at his >>>>>>>>> presentation" This sounds like there is there is a row limit too >>>>>>>>> not only columns?? >>>>>>>>> >>>>>>>>> If I am reading this correctly 10 15GB partitions means 10 >>>>>>>>> partitions (like 10 row keys, thats too small) with each partition >>>>>>>>> of size >>>>>>>>> 15GB. (thats like 10 million columns where each column can have a >>>>>>>>> data of >>>>>>>>> size 1KB). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks to CASSANDRA-11206, I think we can have much larger >>>>>>>>>> partition than before 3.6. >>>>>>>>>> (Robert said he could treat safely 10 15GB partitions at his >>>>>>>>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY) >>>>>>>>>> >>>>>>>>>> But is there still 2B columns limit on the Cassandra code? >>>>>>>>>> If so, out of curiosity, I'd like to know where the bottleneck >>>>>>>>>> is. Could anyone let me know about it? >>>>>>>>>> >>>>>>>>>> Thanks Yasuharu. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxg...@gmail.com> >>>>>>>>>> : >>>>>>>>>> >>>>>>>>>>> The "2 billion column limit" press clipping "puffery". This >>>>>>>>>>> statement seemingly became popular because highly traffic >>>>>>>>>>> traffic-ed story, >>>>>>>>>>> in which a tech reporter embellished on a statement to make a >>>>>>>>>>> splashy >>>>>>>>>>> article. >>>>>>>>>>> >>>>>>>>>>> The effect is something like this: >>>>>>>>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston >>>>>>>>>>> es-and-the-study-that-never-existed/ >>>>>>>>>>> >>>>>>>>>>> Iced tea does not cause kidney stones! Cassandra does not store >>>>>>>>>>> rows with 2 billion columns! It is just not true. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I >>>>>>>>>>>> thought this is an open ended question as it can involve ideas from >>>>>>>>>>>> everywhere including the Cassandra java driver mailing lists so >>>>>>>>>>>> sorry If >>>>>>>>>>>> that bothered you for some reason. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha < >>>>>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to >>>>>>>>>>>>> multiple lists in the same message. (based on postgresql mailing >>>>>>>>>>>>> lists >>>>>>>>>>>>> rules). >>>>>>>>>>>>> Example I'm not subscribed to those, and now the messages are >>>>>>>>>>>>> separated. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha < >>>>>>>>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> There are some issues working on larger partitions. >>>>>>>>>>>>>> Hbase doesn't do what you say! You have also to be carefull >>>>>>>>>>>>>> on hbase not to create large rows! But since they are >>>>>>>>>>>>>> globally-sorted, you >>>>>>>>>>>>>> can easily sort between them and create small rows. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In my opinion, cassandra people are wrong, in that they say >>>>>>>>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually >>>>>>>>>>>>>> use >>>>>>>>>>>>>> globally-sorted most of the time! You have to be careful though >>>>>>>>>>>>>> (just like >>>>>>>>>>>>>> with random partition) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe >>>>>>>>>>>>>> there is a way. >>>>>>>>>>>>>> The most "recent", means there's a timestamp in there ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali < >>>>>>>>>>>>>> k...@peernova.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I understand Cassandra can have a maximum of 2B rows per >>>>>>>>>>>>>>> partition but in practice some people seem to suggest the magic >>>>>>>>>>>>>>> number is >>>>>>>>>>>>>>> 100K. why not create another partition/rowkey automatically >>>>>>>>>>>>>>> (whenever we >>>>>>>>>>>>>>> reach a safe limit that we consider would be efficient) with >>>>>>>>>>>>>>> auto >>>>>>>>>>>>>>> increment bigint as a suffix appended to the new rowkey? so >>>>>>>>>>>>>>> that the >>>>>>>>>>>>>>> driver can return the new rowkey indicating that there is a >>>>>>>>>>>>>>> new partition >>>>>>>>>>>>>>> and so on...Now I understand this would involve allowing >>>>>>>>>>>>>>> partial row key >>>>>>>>>>>>>>> searches which currently Cassandra wouldn't do (but I believe >>>>>>>>>>>>>>> HBASE does) >>>>>>>>>>>>>>> and thinking about token ranges and potentially many other >>>>>>>>>>>>>>> things.. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My current problem is this >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have a row key followed by bunch of columns (this is not >>>>>>>>>>>>>>> time series data) >>>>>>>>>>>>>>> and these columns can grow to any number so since I have >>>>>>>>>>>>>>> 100K limit (or whatever the number is. say some limit) I want >>>>>>>>>>>>>>> to break the >>>>>>>>>>>>>>> partition into level/pages >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rowkey1, page1->col1, col2, col3...... >>>>>>>>>>>>>>> rowkey1, page2->col1, col2, col3...... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> now say my Cassandra db is populated with data and say my >>>>>>>>>>>>>>> application just got booted up and I want to most recent value >>>>>>>>>>>>>>> of a certain >>>>>>>>>>>>>>> partition but I don't know which page it belongs to since my >>>>>>>>>>>>>>> application >>>>>>>>>>>>>>> just got booted up? how do I solve this in the most efficient >>>>>>>>>>>>>>> that is >>>>>>>>>>>>>>> possible in Cassandra today? I understand I can create MV, >>>>>>>>>>>>>>> other tables >>>>>>>>>>>>>>> that can hold some auxiliary data such as number of pages per >>>>>>>>>>>>>>> partition and >>>>>>>>>>>>>>> so on..but that involves the maintenance cost of that other >>>>>>>>>>>>>>> table which I >>>>>>>>>>>>>>> cannot afford really because I have MV's, secondary indexes for >>>>>>>>>>>>>>> other good >>>>>>>>>>>>>>> reasons. so it would be great if someone can explain the best >>>>>>>>>>>>>>> way possible >>>>>>>>>>>>>>> as of today with Cassandra? By best way I mean is it possible >>>>>>>>>>>>>>> with one >>>>>>>>>>>>>>> request? If Yes, then how? If not, then what is the next best >>>>>>>>>>>>>>> way to solve >>>>>>>>>>>>>>> this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> kant >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >