Large partitions

2018-09-13 Thread Gedeon Kamga
Folks,

Based on the information found here
https://docs.datastax.com/en/dse-planning/doc/planning/planningPartitionSize.html
,
the recommended limit for a partition size is 100MB. Even though, DataStax
clearly states that this is a rule of thumb, some team members are claiming
that our Cassandra *Write *is very slow because the partitions on some
tables are over 100MB. I know for a fact that this rule has changed since
2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition
size is *a few hundreds MB*, given the improvement on the architecture.
Now, I am unable to find the reference (maybe I got it at a Cassandra
training by DataStax). I would like to share it with my team. Did anyone
come across this information? If yes, can you please share it?

Thanks!


Re: Decommissioned node cluster shows as down

2017-05-20 Thread Gedeon Kamga
If the nodetool status shows DN, then you will need to tell the cluster to
remove the memory of that node by running removenode. If that doesn't work,
you may run assassinate - the ultimate command to shutdown the node.

Thanks,
Gedeon

On Tue, May 16, 2017 at 12:01 PM, Mark Furlong 
wrote:

> I have a node I decommissioned on a large ring using 2.1.12. The node
> completed the decommission process and is no longer communicating with the
> rest of the cluster. However when I run a nodetool status on any node in
> the cluster it shows the node as ‘DN’. Why is this and should I just run a
> removenode now?
>
>
>
> *Thanks,*
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com *
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
>
> Lehi, UT 84043
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>


Re: Reg:- Data Modelling Concepts

2017-05-20 Thread Gedeon Kamga
One issue that will be encountered with this data model is the unbounded
partition growth. Partition will continue to grow indefinitely over time
and there will be a risk to hit the limit of 2 billions columns per
partition.

Consider a composite partition key.

Thanks,
Gedeon


On Wed, May 17, 2017 at 10:56 PM, Anthony Grasso 
wrote:

> Hi Nandan,
>
> If there is a requirement to answer a query "What are the changes to a
> book made by a particular user?", then yes the schema you have proposed can
> work. To obtain the list of updates for a book by a user from the
> *book_title_by_user* table will require the partition key (*book_title*),
> the first clustering key (*book_id*), and the second clustering key (
> *user_id*).
>
> i.e. SELECT * FROM book_title_by_user WHERE book_title= AND
> book_id= AND user_id=;
>
> If the book_id is unnecessary for answering the above query, it may be
> worth changing the primary key ordering of the *book_title_by_user* table
> to the following.
>
> CREATE TABLE book_title_by_user(
>   book_title text,
>   book_id uuid,
>   user_id uuid ,
>   ts timeuuid,
>   PRIMARY KEY (book_title, user_id, book_id, ts)
> );
>
> This will then simplify the select statement to
>
> SELECT * FROM book_title_by_user WHERE book_title= AND
> user_id=;
>
> Kind regards,
> Anthony
>
> On 17 May 2017 at 13:05, @Nandan@  wrote:
>
>> Hi Jon,
>>
>> We need to keep tracking of all updates like 'User' of our platform can
>> check what changes made before.
>> I am thinking in this way..
>> CREATE TABLE book_info (
>> book_id uuid,
>> book_title text,
>> author_name text,
>> updated_at timestamp,
>> PRIMARY KEY(book_id));
>> This table will contain details about all book with unique updated
>> details.
>> CREATE TABLE book_title_by_user(
>> book_title text,
>> book_id uuid,
>> user_id uuid ,
>> ts timeuuid,
>> primary key(book_title,book_id,user_id,ts));
>> This table wil contain details of multiple old updates of book which can
>> be done by mulplie users like MANY TO MANY .
>>
>> What do you think on this?
>>
>> On Wed, May 17, 2017 at 9:44 AM, Jonathan Haddad 
>> wrote:
>>
>>> I don't understand why you need to store the old value a second time.
>>> If you know that the value went from A -> B -> C, just store the new value,
>>> not the old.  You can see that it changed from A->B->C without storing it
>>> twice.
>>>
>>> On Tue, May 16, 2017 at 6:36 PM @Nandan@ 
>>> wrote:
>>>
 The requirement is to create DB in which we have to keep data of
 Updated values as well as which user update the particular book details and
 what they update.

 We are like to create a schema which store book info, as well as the
 history of the update, made based on book_title, author, publisher, price
 changed.
 Like we want to store what was old data and what new data updated.. and
 also want to check which user updated the relevant change. Because suppose
 if some changes not made correctly then they can check changes and revert
 based on old values.
 We are trying to make a USER based Schema.

 For example:-
 id:- 1
 Name: - Harry Poter
 Author : - JK Rolling

 New Update Done by user_id 2:-
 id :- 1
 Name:- Harry Pottor
 Author:- J.K. Rolls

 Update history also need to store as :-
 User_id :- 2
 Old Author :- JK Rolling
 New Author :- J.K. Rolls

 So I need to update the details of Book which is done by UPSERT. But
 also I have to keep details like which user updated and what updated.


 One thing that helps define the schema is knowing what queries will be
 made to the database up front.
 Few queries that the database needs to answer.
 What are the current details of a book?
 What is the most recent update to a particular book?
 What are the updates that have been made to a particular book?
 What are the details for a particular update?


 Update frequently will be like Update will happen based on Title, name,
 Author, price , publisher like. So not very high frequently.

 Best Regards,
 Nandan

>>>
>>
>