I did mention this in my previous email. This is not time series data. I understand how to structure it if it is a time series data/
What do you mean globally sorted? you mean keeping every partition sorted (since I come from Casandra world)? rowkey 1 -> blob page -> int or long or bigint col1 -> text col2 -> blob co3 -> bigint On Wed, Oct 12, 2016 at 1:37 AM, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > There are some issues working on larger partitions. > Hbase doesn't do what you say! You have also to be carefull on hbase not > to create large rows! But since they are globally-sorted, you can easily > sort between them and create small rows. > > In my opinion, cassandra people are wrong, in that they say "globally > sorted is the devil!" while all fb/google/etc actually use globally-sorted > most of the time! You have to be careful though (just like with random > partition) > > Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a > way. > The most "recent", means there's a timestamp in there ? > > On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> wrote: > >> Hi All, >> >> I understand Cassandra can have a maximum of 2B rows per partition but in >> practice some people seem to suggest the magic number is 100K. why not >> create another partition/rowkey automatically (whenever we reach a safe >> limit that we consider would be efficient) with auto increment bigint as >> a suffix appended to the new rowkey? so that the driver can return the new >> rowkey indicating that there is a new partition and so on...Now I >> understand this would involve allowing partial row key searches which >> currently Cassandra wouldn't do (but I believe HBASE does) and thinking >> about token ranges and potentially many other things.. >> >> My current problem is this >> >> I have a row key followed by bunch of columns (this is not time series >> data) >> and these columns can grow to any number so since I have 100K limit (or >> whatever the number is. say some limit) I want to break the partition into >> level/pages >> >> rowkey1, page1->col1, col2, col3...... >> rowkey1, page2->col1, col2, col3...... >> >> now say my Cassandra db is populated with data and say my application >> just got booted up and I want to most recent value of a certain partition >> but I don't know which page it belongs to since my application just got >> booted up? how do I solve this in the most efficient that is possible in >> Cassandra today? I understand I can create MV, other tables that can hold >> some auxiliary data such as number of pages per partition and so on..but >> that involves the maintenance cost of that other table which I cannot >> afford really because I have MV's, secondary indexes for other good >> reasons. so it would be great if someone can explain the best way possible >> as of today with Cassandra? By best way I mean is it possible with one >> request? If Yes, then how? If not, then what is the next best way to solve >> this? >> >> Thanks, >> kant >> > >