I did mention this in my previous email.  This is not time series data. I
understand how to structure it if it is a time series data/

What do you mean globally sorted? you mean keeping every partition sorted
(since I come from Casandra world)?

rowkey 1 -> blob
page -> int or long or bigint
col1  -> text
col2 -> blob
co3 -> bigint

On Wed, Oct 12, 2016 at 1:37 AM, Dorian Hoxha <dorian.ho...@gmail.com>
wrote:

> There are some issues working on larger partitions.
> Hbase doesn't do what you say! You have also to be carefull on hbase not
> to create large rows! But since they are globally-sorted, you can easily
> sort between them and create small rows.
>
> In my opinion, cassandra people are wrong, in that they say "globally
> sorted is the devil!" while all fb/google/etc actually use globally-sorted
> most of the time! You have to be careful though (just like with random
> partition)
>
> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
> way.
> The most "recent", means there's a timestamp in there ?
>
> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> wrote:
>
>> Hi All,
>>
>> I understand Cassandra can have a maximum of 2B rows per partition but in
>> practice some people seem to suggest the magic number is 100K. why not
>> create another partition/rowkey automatically (whenever we reach a safe
>> limit that  we consider would be efficient)  with auto increment bigint  as
>> a suffix appended to the new rowkey? so that the driver can return the new
>> rowkey  indicating that there is a new partition and so on...Now I
>> understand this would involve allowing partial row key searches which
>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>> about token ranges and potentially many other things..
>>
>> My current problem is this
>>
>> I have a row key followed by bunch of columns (this is not time series
>> data)
>> and these columns can grow to any number so since I have 100K limit (or
>> whatever the number is. say some limit) I want to break the partition into
>> level/pages
>>
>> rowkey1, page1->col1, col2, col3......
>> rowkey1, page2->col1, col2, col3......
>>
>> now say my Cassandra db is populated with data and say my application
>> just got booted up and I want to most recent value of a certain partition
>> but I don't know which page it belongs to since my application just got
>> booted up? how do I solve this in the most efficient that is possible in
>> Cassandra today? I understand I can create MV, other tables that can hold
>> some auxiliary data such as number of pages per partition and so on..but
>> that involves the maintenance cost of that other table which I cannot
>> afford really because I have MV's, secondary indexes for other good
>> reasons. so it would be great if someone can explain the best way possible
>> as of today with Cassandra? By best way I mean is it possible with one
>> request? If Yes, then how? If not, then what is the next best way to solve
>> this?
>>
>> Thanks,
>> kant
>>
>
>

Reply via email to