compacting 10 sstables each of them have a 15GB partition in what duration?
On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope....@gmail.com> wrote: > Please forget the part in my sentence. > For more correctly, maybe I should have said like "He could compact 10 > sstables each of them have a 15GB partition". > What I wanted to say is we can store much more rows(and columns) in a > partition than before 3.6. > > 2016-10-15 15:34 GMT+09:00 Kant Kodali <k...@peernova.com>: > >> "Robert said he could treat safely 10 15GB partitions at his presentation" >> This sounds like there is there is a row limit too not only columns?? >> >> If I am reading this correctly 10 15GB partitions means 10 partitions >> (like 10 row keys, thats too small) with each partition of size 15GB. >> (thats like 15 million columns where each column can have a data of size >> 1KB). >> >> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <k...@peernova.com> wrote: >> >>> "Robert said he could treat safely 10 15GB partitions at his >>> presentation" This sounds like there is there is a row limit too not >>> only columns?? >>> >>> If I am reading this correctly 10 15GB partitions means 10 partitions >>> (like 10 row keys, thats too small) with each partition of size 15GB. >>> (thats like 10 million columns where each column can have a data of size >>> 1KB). >>> >>> >>> >>> >>> >>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope....@gmail.com> >>> wrote: >>> >>>> Thanks to CASSANDRA-11206, I think we can have much larger partition >>>> than before 3.6. >>>> (Robert said he could treat safely 10 15GB partitions at his >>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY) >>>> >>>> But is there still 2B columns limit on the Cassandra code? >>>> If so, out of curiosity, I'd like to know where the bottleneck is. >>>> Could anyone let me know about it? >>>> >>>> Thanks Yasuharu. >>>> >>>> >>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxg...@gmail.com>: >>>> >>>>> The "2 billion column limit" press clipping "puffery". This statement >>>>> seemingly became popular because highly traffic traffic-ed story, in which >>>>> a tech reporter embellished on a statement to make a splashy article. >>>>> >>>>> The effect is something like this: >>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston >>>>> es-and-the-study-that-never-existed/ >>>>> >>>>> Iced tea does not cause kidney stones! Cassandra does not store rows >>>>> with 2 billion columns! It is just not true. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> >>>>> wrote: >>>>> >>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I thought >>>>>> this is an open ended question as it can involve ideas from everywhere >>>>>> including the Cassandra java driver mailing lists so sorry If that >>>>>> bothered >>>>>> you for some reason. >>>>>> >>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.ho...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to >>>>>>> multiple lists in the same message. (based on postgresql mailing lists >>>>>>> rules). >>>>>>> Example I'm not subscribed to those, and now the messages are >>>>>>> separated. >>>>>>> >>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha < >>>>>>> dorian.ho...@gmail.com> wrote: >>>>>>> >>>>>>>> There are some issues working on larger partitions. >>>>>>>> Hbase doesn't do what you say! You have also to be carefull on >>>>>>>> hbase not to create large rows! But since they are globally-sorted, >>>>>>>> you can >>>>>>>> easily sort between them and create small rows. >>>>>>>> >>>>>>>> In my opinion, cassandra people are wrong, in that they say >>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually use >>>>>>>> globally-sorted most of the time! You have to be careful though (just >>>>>>>> like >>>>>>>> with random partition) >>>>>>>> >>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there >>>>>>>> is a way. >>>>>>>> The most "recent", means there's a timestamp in there ? >>>>>>>> >>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I understand Cassandra can have a maximum of 2B rows per partition >>>>>>>>> but in practice some people seem to suggest the magic number is 100K. >>>>>>>>> why >>>>>>>>> not create another partition/rowkey automatically (whenever we reach >>>>>>>>> a safe >>>>>>>>> limit that we consider would be efficient) with auto increment >>>>>>>>> bigint as >>>>>>>>> a suffix appended to the new rowkey? so that the driver can return >>>>>>>>> the new >>>>>>>>> rowkey indicating that there is a new partition and so on...Now I >>>>>>>>> understand this would involve allowing partial row key searches which >>>>>>>>> currently Cassandra wouldn't do (but I believe HBASE does) and >>>>>>>>> thinking >>>>>>>>> about token ranges and potentially many other things.. >>>>>>>>> >>>>>>>>> My current problem is this >>>>>>>>> >>>>>>>>> I have a row key followed by bunch of columns (this is not time >>>>>>>>> series data) >>>>>>>>> and these columns can grow to any number so since I have 100K >>>>>>>>> limit (or whatever the number is. say some limit) I want to break the >>>>>>>>> partition into level/pages >>>>>>>>> >>>>>>>>> rowkey1, page1->col1, col2, col3...... >>>>>>>>> rowkey1, page2->col1, col2, col3...... >>>>>>>>> >>>>>>>>> now say my Cassandra db is populated with data and say my >>>>>>>>> application just got booted up and I want to most recent value of a >>>>>>>>> certain >>>>>>>>> partition but I don't know which page it belongs to since my >>>>>>>>> application >>>>>>>>> just got booted up? how do I solve this in the most efficient that is >>>>>>>>> possible in Cassandra today? I understand I can create MV, other >>>>>>>>> tables >>>>>>>>> that can hold some auxiliary data such as number of pages per >>>>>>>>> partition and >>>>>>>>> so on..but that involves the maintenance cost of that other table >>>>>>>>> which I >>>>>>>>> cannot afford really because I have MV's, secondary indexes for other >>>>>>>>> good >>>>>>>>> reasons. so it would be great if someone can explain the best way >>>>>>>>> possible >>>>>>>>> as of today with Cassandra? By best way I mean is it possible with one >>>>>>>>> request? If Yes, then how? If not, then what is the next best way to >>>>>>>>> solve >>>>>>>>> this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> kant >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >