from:"Anthony Grasso"

Re: Log4j vulnerability

2022-01-11 Thread Anthony Grasso

Hi Arvinder,

You are correct; tlp-stress includes Log4j as one of its libraries and
users will need to update the JAR file.

On 16th December 2021, tlp-stress was updated [1] to include Log4j 2.16.0
which fixed CVE-2021-45046. Version 5.0.0 was released which included this
change.

Unfortunately, further security issues were identified in Log4j v2.16.0. On
10th January 2022, tlp-stress was updated again

[2] to include Log4j 2.17.1 which fixed CVE-2021-45105 and CVE-2021-44832
[2]. A new version of tlp-stress will be released soon which will include
these updates.

For now please build and use the latest version of the master branch to get
the latest patch.

Kind regards,
Anthony

[1]
https://github.com/thelastpickle/tlp-stress/commit/298135e2bfc6d4d23f04154f098c3592dd3b32f0
[2]
https://github.com/thelastpickle/tlp-stress/commit/2d4542c27d3f1c0e24899c01247b9a8ee3c9a238

On Tue, 11 Jan 2022 at 16:56, Arvinder Dhillon 
wrote:

> If anyone uses tlp-stress tool, it uses Log4j. It might not be in use most
> of the time, you might want to remove/upgrade the jar.
>
> On Mon, Dec 13, 2021 at 3:58 PM Bowen Song  wrote:
>
>> Do you mean the log4j-over-slf4j-#.jar? If so, please read:
>> http://slf4j.org/log4shell.html
>>
>> On 13/12/2021 23:48, Rahul Reddy wrote:
>>
>> Hello,
>>
>>
>> I see this jar  log4j-over-slf4j-1.7.7.jar does it have any impact on
>> it? Why that jar is used for ?
>>
>>
>>
>> On Sat, Dec 11, 2021 at 12:45 PM Brandon Williams 
>> wrote:
>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-5883
>>>
>>> As that ticket shows, Apache Cassandra has never used log4j2.
>>>
>>> On Sat, Dec 11, 2021 at 11:07 AM Abdul Patel 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Any idea if any of open source Cassandra versions are impacted with
>>> log4j vulnerability which was reported on dec 9th
>>>
>>

Re: Migrating Cassandra from 3.11.11 to 4.0.0 vs num_tokens

2021-09-05 Thread Anthony Grasso

Hi Jean,

This is a really good question.

As Erick mentioned, if you want to change your cluster's *num_tokens* to 16
to match the 4.0 default, you will need to perform a datacenter migration.
Feel free to read over this blog post
 written
by The Last Pickle which will provide background details about the process.
The information should follow the Apache website runbook (when published)
fairly closely.

Kind regards,

On Sat, 4 Sept 2021 at 20:45, Jean Tremblay  wrote:

> Great Thank you for the answer and the link!
>
>
> On 4 Sep 2021, at 11:35, Erick Ramirez  wrote:
>
> It isn't possible to change the tokens on a node once it is already part
> of the cluster. Cassandra won't allow you to do it because it will make the
> data  already on disk unreadable. You'll need to either configure new nodes
> or add a new DC. I've answered an identical question in
> https://community.datastax.com/questions/12213/ where I've provided steps
> for the 2 options. I hope to draft a runbook and get it published on the
> Apache website in the coming days. Cheers!
>
>
>

Re: Generating evenly distributed tokens for vnodes

2020-05-28 Thread Anthony Grasso

Hi Kornel,

Great use of the script for generating initial tokens! I agree that you can
achieve an optimal token distribution in a cluster using such a method.

One thing to think about is the process for expanding the size of the
cluster in this case. For example consider the scenario where you wanted to
insert a single new node into the cluster. To do this you would need to
calculate what the new token ranges should be for the nodes including the
new node. You would then need to reassign existing tokens to other nodes
using 'nodetool move'. You would likely need to call this command a few
times to do a few movements in order to achieve the newly calculated token
assignments. Once the "gap" in the token ranges has been created, you would
then update the initial_token property for the existing nodes in the
cluster. Finally, you could then insert the new node with the assigned
tokens.

While the above process could be used to maintain an optimal token
distribution in a cluster, it does increase operational overhead. This is
where allocate_tokens_for_keyspace and
allocate_tokens_for_local_replication_factor (4.0 only) play a critical
role. They save the operational overhead when changing the size of the
cluster. In addition, from my experience they do a pretty good job at
keeping the token ranges evenly distributed when expanding the cluster.
Even in the case where a low number for num_tokens is used. If expanding
the cluster size is required during an emergency, using the
allocate_token_* setting would be the most simple and reliable way to
quickly insert a node while maintaining reasonable token distribution.

The only other way to expand the cluster and maintain even token
distribution without using an allocate_token_* setting, is to double the
size of the cluster each time. Obviously this has its own draw backs in
terms of increase costs to both money and time compared to inserting a
single node.

Hope this helps.

Kind regards,
Anthony

On Thu, 28 May 2020 at 04:52, Kornel Pal  wrote:

> As I understand, the previous discussion is about using
> allocate_tokens_for_keyspace for allocating tokens for most of the
> nodes. On the other hand, I am proposing to generate all the tokens for
> all the nodes using a Python script.
>
> This seems to result in perfectly even token ownership distribution
> across all the nodes for all possible replication factors, thus being an
> improvement over using allocate_tokens_for_keyspace.
>
> Elliott Sims wrote:
> > There's also a slightly older mailing list discussion on this subject
> > that goes into detail on this sort of strategy:
> > https://www.mail-archive.com/user@cassandra.apache.org/msg60006.html
> >
> > I've been approximately following it, repeating steps 3-6 for the first
> > host in each "rack(replica, since I have 3 racks and RF=3) then 8-10 for
> > the remaining hosts in the new datacenter.  So far, so good (sample size
> > of 1) but it's a pretty painstaking process
> >
> > This should get a lot simpler with Cassandra 4+'s
> > "allocate_tokens_for_local_replication_factor" option, which will
> > default to 3.
> >
> > On Wed, May 27, 2020 at 4:34 AM Kornel Pal  > > wrote:
> >
> > Hi,
> >
> > Generating ideal tokens for single-token datacenters is well
> understood
> > and documented, but there is much less information available on
> > generating tokens with even ownership distribution when using vnodes.
> > The best description I could find on token generation for vnodes is
> >
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >
> > While allocate_tokens_for_keyspace results in much more even
> ownership
> > distribution than random allocation, and does a great job at
> balancing
> > ownership when adding new nodes, using it for creating a new
> datacenter
> > results in less than ideal ownership distribution.
> >
> > After some experimentation, I found that it is possible to generate
> all
> > the tokens for a new datacenter with an extended version of the
> Python
> > script presented in the above blog post. Using these tokens seem to
> > result in perfectly even ownership distribution with various
> > token/node/rack configurations for all possible replication factors.
> >
> > Murmur3Partitioner:
> >   >>> datacenter_offset = 0
> >   >>> num_tokens = 4
> >   >>> num_racks = 3
> >   >>> num_nodes = 3
> >   >>> print "\n".join(['[Rack #{}, Node #{}] initial_token:
> > {}'.format(r
> > + 1, n + 1, ','.join([str(((2**64 / (num_tokens * num_nodes *
> > num_racks)) * (t * num_nodes * num_racks + n * num_racks + r)) -
> > 2**63 +
> > datacenter_offset) for t in range(num_tokens)])) for r in
> > range(num_racks) for n in range(num_nodes)])
> > [Rack #1, Node #1] initial_token:
> > -9223372036854775808,-4611686018427387908,-8,4611686018427387892
> > [Rack #1, Node #2]

Re: [EXTERNAL] Cassandra 3.11.X upgrades

2020-03-03 Thread Anthony Grasso

Manish is correct.

Upgrade the Cassandra version of a single node only. If that node is
behaving as expected (i.e. is in an Up/Normal state and no errors in the
logs), then upgrade the Cassandra version for each node one at a time. Be
sure to check that each node is running as expected. Once the Cassandra
version is upgraded on all nodes in the cluster and all nodes are in a
healthy state, then you upgrade the SSTables.

As Manish has pointed out, you want to be able to rollback the Cassandra
version easily if something goes wrong with the software upgrade.

If you have the disk space, I recommend taking a snapshot of the SSTables
on all nodes first before upgrading the Cassandra software version. The
snapshots are a safe guard incase there is a major problem after upgrading
SSTables.

Regards,

On Wed, 4 Mar 2020 at 15:08, manish khandelwal 
wrote:

> Should upgradesstables not be run after every node is upgraded? If we need
> to rollback then  we will not be able to downgrade sstables to older
> version.
>
> Regards
> Manish
>
> On Tue, Mar 3, 2020 at 11:26 PM Hossein Ghiyasi Mehr <
> ghiyasim...@gmail.com> wrote:
>
>> It's more safe to upgrade one node before upgrading another node to avoid
>> down time.
>> After upgrading binary and package, run upgradesstables on candidate node
>> then do it on all cluster nodes one by one.
>> *---*
>> *VafaTech  : A Total Solution for Data Gathering
>> & Analysis*
>> *---*
>>
>>
>> On Thu, Feb 13, 2020 at 9:27 PM Sergio  wrote:
>>
>>>
>>>- Verify that nodetool upgradesstables has completed successfully on
>>>all nodes from any previous upgrade
>>>- Turn off repairs and any other streaming operations (add/remove
>>>nodes)
>>>- Nodetool drain on the node that needs to be stopped (seeds first,
>>>preferably)
>>>- Stop an un-upgraded node (seeds first, preferably)
>>>- Install new binaries and configs on the down node
>>>- Restart that node and make sure it comes up clean (it will
>>>function normally in the cluster – even with mixed versions)
>>>- nodetool statusbinary to verify if it is up and running
>>>- Repeat for all nodes
>>>- Once the binary upgrade has been performed in all the nodes: Run
>>>upgradesstables on each node (as many at a time as your load will allow).
>>>Minor upgrades usually don’t require this step (only if the sstable 
>>> format
>>>has changed), but it is good to check.
>>>- NOTE: in most cases applications can keep running and will not
>>>notice much impact – unless the cluster is overloaded and a single node
>>>down causes impact.
>>>
>>>
>>>
>>>I added 2 points to the list to clarify.
>>>
>>>Should we add this in a FAQ in the cassandra doc or in the awesome
>>>cassandra https://cassandra.link/awesome/
>>>
>>>Thanks,
>>>
>>>Sergio
>>>
>>>
>>> Il giorno mer 12 feb 2020 alle ore 10:58 Durity, Sean R <
>>> sean_r_dur...@homedepot.com> ha scritto:
>>>
 Check the readme.txt for any upgrade notes, but the basic procedure is
 to:

- Verify that nodetool upgradesstables has completed successfully
on all nodes from any previous upgrade
- Turn off repairs and any other streaming operations (add/remove
nodes)
- Stop an un-upgraded node (seeds first, preferably)
- Install new binaries and configs on the down node
- Restart that node and make sure it comes up clean (it will
function normally in the cluster – even with mixed versions)
- Repeat for all nodes
- Run upgradesstables on each node (as many at a time as your load
will allow). Minor upgrades usually don’t require this step (only if the
sstable format has changed), but it is good to check.
- NOTE: in most cases applications can keep running and will not
notice much impact – unless the cluster is overloaded and a single node
down causes impact.

 Sean Durity – Staff Systems Engineer, Cassandra

 *From:* Sergio 
 *Sent:* Wednesday, February 12, 2020 11:36 AM
 *To:* user@cassandra.apache.org
 *Subject:* [EXTERNAL] Cassandra 3.11.X upgrades

 Hi guys!

 How do you usually upgrade your cluster for minor version upgrades?

 I tried to add a node with 3.11.5 version to a test cluster with 3.11.4
 nodes.

 Is there any restriction?

 Best,

 Sergio

 --

 The information in this Internet Email is confidential and may be
 legally privileged. It is intended solely for the addressee. Access to this
 Email by anyone else is unauthorized. If you are not the intended
 recipient, any disclosure, copying, distribution or any action taken or
 omitted to be taken in reliance on

Re: Should we use Materialised Views or ditch them ?

2020-03-01 Thread Anthony Grasso

Hi Tobias,

I have had a similar experiences to Jon where I have seen Materialized
Views cause major issues in clusters. I too recommend avoiding them.

Regards,
Anthony

On Sat, 29 Feb 2020 at 07:37, Jon Haddad  wrote:

> I also recommend avoiding them.  I've seen too many clusters fall over as
> a result of their usage.
>
> On Fri, Feb 28, 2020 at 9:52 AM Max C.  wrote:
>
>> The general view of the community is that you should *NOT* use them in
>> production, due to multiple serious outstanding issues (see Jira).  We used
>> them quite a bit when they first came out and have since rolled back all
>> uses except for the absolute most basic cases (ex:  a table with 30K rows
>> that isn’t updated).  If we were to do it over, we would not use them at
>> all.
>>
>> - Max
>>
>> On Feb 28, 2020, at 7:07 am, Tobias Eriksson 
>> wrote:
>>
>> Hi
>>  A debate has surfaced in my company, whether to keep or remove
>> Materialized Views
>> The Datastax FAQ says sure thing, go ahead and use it
>> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/faqMV.html
>> But know the limitations
>>
>> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/knownLimitationsMV.html
>> and best practices
>>
>> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/bestPracticesMV.html
>>
>> What is the community take on using MV(Materialized Views) in production ?
>>
>> -Tobias
>>
>>
>>

Re: [EXTERNAL] How to reduce vnodes without downtime

2020-02-02 Thread Anthony Grasso

Hi Sergio,

There is a misunderstanding here. My post makes no recommendation for the
value of num_tokens. Rather, it focuses on how to use
the allocate_tokens_for_keyspace setting when creating a new cluster.

Whilst a value of 4 is used for num_tokens in the post, it was chosen for
demonstration purposes. Specifically it makes:

   - the uneven token distribution in a small cluster very obvious,
   - identifying the endpoints displayed in nodetool ring easy, and
   - the initial_token setup less verbose and easier to follow.

I will add an editorial note to the post with the above information
so there is no confusion about why 4 tokens were used.

I would only consider moving a cluster to 4 tokens if it is larger than 100
nodes. If you read through the paper that Erick mentioned, written by Joe
Lynch & Josh Snyder, they show that the num_tokens impacts the availability
of large scale clusters.

If you are after more details about the trade-offs between different sized
token values, please see the discussion on the dev mailing list: "[Discuss]
num_tokens default in Cassandra 4.0
<https://www.mail-archive.com/search?l=dev%40cassandra.apache.org=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22=oldest>
".

Regards,
Anthony

On Sat, 1 Feb 2020 at 10:07, Sergio  wrote:

>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>  This
> is the article with 4 token recommendations.
> @Erick Ramirez. which is the dev thread for the default 32 tokens
> recommendation?
>
> Thanks,
> Sergio
>
> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
> flightc...@gmail.com> ha scritto:
>
>> There's an active discussion going on right now in a separate dev thread.
>> The current "default recommendation" is 32 tokens. But there's a push for 4
>> in combination with allocate_tokens_for_keyspace from Jon Haddad & co
>> (based on a paper from Joe Lynch & Josh Snyder).
>>
>> If you're satisfied with the results from your own testing, go with 4
>> tokens. And that's the key -- you must test, test, TEST! Cheers!
>>
>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon 
>> wrote:
>>
>>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>>> Thanks
>>>
>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>>> sean_r_dur...@homedepot.com> wrote:
>>>
>>>> These are good clarifications and expansions.
>>>>
>>>>
>>>>
>>>> Sean Durity
>>>>
>>>>
>>>>
>>>> *From:* Anthony Grasso 
>>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>>> *To:* user 
>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>>
>>>>
>>>>
>>>> Hi Maxim,
>>>>
>>>>
>>>>
>>>> Basically what Sean suggested is the way to do this without downtime.
>>>>
>>>>
>>>>
>>>> To clarify the, the *three* steps following the "Decommission each
>>>> node in the DC you are working on" step should be applied to *only*
>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every
>>>> node*" it applies to only the decommissioned nodes.
>>>>
>>>>
>>>>
>>>> In addition, the step that says "Wipe data on all the nodes", I would
>>>> delete all files in the following directories on the decommissioned nodes.
>>>>
>>>>- data (usually located in /var/lib/cassandra/data)
>>>>- commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>>- hints (usually located in /var/lib/casandra/hints)
>>>>- saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Anthony
>>>>
>>>>
>>>>
>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>>> sean_r_dur...@homedepot.com> wrote:
>>>>
>>>> Your procedure won’t work very well. On the first node, if you switched
>>>> to 4, you would end up with only a tiny fraction of the data (because the
>>>> other nodes would still be at 256). I updated a large cluster (over 150
>>>> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>>>>
>>>>
>>>>
>>>>- Stop all repairs
&g

Re: [EXTERNAL] How to reduce vnodes without downtime

2020-01-30 Thread Anthony Grasso

Hi Maxim,

Basically what Sean suggested is the way to do this without downtime.

To clarify the, the *three* steps following the "Decommission each node in
the DC you are working on" step should be applied to *only* the
decommissioned nodes. So where it say "*all nodes*" or "*every node*" it
applies to only the decommissioned nodes.

In addition, the step that says "Wipe data on all the nodes", I would
delete all files in the following directories on the decommissioned nodes.

   - data (usually located in /var/lib/cassandra/data)
   - commitlogs (usually located in /var/lib/cassandra/commitlogs)
   - hints (usually located in /var/lib/casandra/hints)
   - saved_caches (usually located in /var/lib/cassandra/saved_caches)


Cheers,
Anthony

On Fri, 31 Jan 2020 at 03:05, Durity, Sean R 
wrote:

> Your procedure won’t work very well. On the first node, if you switched to
> 4, you would end up with only a tiny fraction of the data (because the
> other nodes would still be at 256). I updated a large cluster (over 150
> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>
>
>
>- Stop all repairs
>- Make sure the app is running against one DC only
>- Change the replication settings on keyspaces to use only 1 DC
>(basically cutting off the other DC)
>- Decommission each node in the DC you are working on. Because the
>replication setting are changed, no streaming occurs. But it releases the
>token assignments
>- Wipe data on all the nodes
>- Update configuration on every node to your new settings, including
>auto_bootstrap = false
>- Start all nodes. They will choose tokens, but not stream any data
>- Update replication factor for all keyspaces to include the new DC
>- I disabled binary on those nodes to prevent app connections
>- Run nodetool reduild with -dc (other DC) on as many nodes as your
>system can safely handle until they are all rebuilt.
>- Re-enable binary (and app connections to the rebuilt DC)
>- Turn on repairs
>- Rest for a bit, then reverse the process for the remaining DCs
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Maxim Parkachov 
> *Sent:* Thursday, January 30, 2020 10:05 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>
>
>
> Hi everyone,
>
>
>
> with discussion about reducing default vnodes in version 4.0 I would like
> to ask, what would be optimal procedure to perform reduction of vnodes in
> existing 3.11.x cluster which was set up with default value 256. Cluster
> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
> not add more servers, nor to create additional DC, everything is physical.
> This should be done without downtime.
>
>
>
> My idea for such procedure would be
>
>
>
> for each node:
>
> - decommission node
>
> - set auto_bootstrap to true and vnodes to 4
>
> - start and wait till node joins cluster
>
> - run cleanup on rest of nodes in cluster
>
> - run repair on whole cluster (not sure if needed after cleanup)
>
> - set auto_bootstrap to false
>
> repeat for each node
>
>
>
> rolling restart of cluster
>
> cluster repair
>
>
>
> Is this sounds right ? My concern is that after decommission, node will
> start on the same IP which could create some confusion.
>
>
>
> Regards,
>
> Maxim.
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Re: Uneven token distribution with allocate_tokens_for_keyspace

2020-01-27 Thread Anthony Grasso

Hi Leo,

The token assignment for each node in the cluster must be unique regardless
of the datacenter they are in. This is because the range of tokens
available to assign to nodes is per cluster. Token allocation is performed
per node at a global level. A datacenter helps define the way data is
replicated and has no influence on how tokens are assigned to nodes.

For example, if a new node is assigned one or more of the tokens already
owned by another node in the cluster, the new node will take ownership of
those tokens. This will happen regardless of which datacenter either node
is in.

Regards,
Anthony

On Sat, 25 Jan 2020 at 02:11, Léo FERLIN SUTTON 
wrote:

> Hi Anthony !
>
> I have a follow-up question :
>
> Check to make sure that no other node in the cluster is assigned any of
>> the four tokens specified above. If there is another node in the cluster
>> that is assigned one of the above tokens, increment the conflicting token
>> by values of one until no other node in the cluster is assigned that token
>> value. The idea is to make sure that these four tokens are unique to the
>> node.
>
>
> I don't understand this part of the process. Why do tokens conflict if the
> nodes owning them are in a different datacenter ?
>
> Regards,
>
> Leo
>
> On Thu, Dec 5, 2019 at 1:00 AM Anthony Grasso 
> wrote:
>
>> Hi Enrico,
>>
>> Glad to hear the problem has been resolved and thank you for the feedback!
>>
>> Kind regards,
>> Anthony
>>
>> On Mon, 2 Dec 2019 at 22:03, Enrico Cavallin 
>> wrote:
>>
>>> Hi Anthony,
>>> thank you for your hints, now the new DC is well balanced within 2%.
>>> I did read your article, but I thought it was needed only for new
>>> "clusters", not also for new "DCs"; but RF is per DC so it makes sense.
>>>
>>> You TLP guys are doing a great job for Cassandra community.
>>>
>>> Thank you,
>>> Enrico
>>>
>>>
>>> On Fri, 29 Nov 2019 at 05:09, Anthony Grasso 
>>> wrote:
>>>
>>>> Hi Enrico,
>>>>
>>>> This is a classic chicken and egg problem with the
>>>> allocate_tokens_for_keyspace setting.
>>>>
>>>> The allocate_tokens_for_keyspace setting uses the replication factor
>>>> of a DC keyspace to calculate the token allocation when a node is added to
>>>> the cluster for the first time.
>>>>
>>>> Nodes need to be added to the new DC before we can replicate the
>>>> keyspace over to it. Herein lies the problem. We are unable to use
>>>> allocate_tokens_for_keyspace unless the keyspace is replicated to the
>>>> new DC. In addition, as soon as you change the keyspace replication to the
>>>> new DC, new data will start to be written to it. To work around this issue
>>>> you will need to do the following.
>>>>
>>>>1. Decommission all the nodes in the *dcNew*, one at a time.
>>>>2. Once all the *dcNew* nodes are decommissioned, wipe the contents
>>>>in the *commitlog*, *data*, *saved_caches*, and *hints* directories
>>>>of these nodes.
>>>>3. Make the first node to add into the *dcNew* a seed node. Set the
>>>>seed list of the first node with its IP address and the IP addresses of 
>>>> the
>>>>other seed nodes in the cluster.
>>>>4. Set the *initial_token* setting for the first node. You can
>>>>calculate the values using the algorithm in my blog post:
>>>>
>>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.
>>>>For convenience I have calculated them:
>>>>*-9223372036854775808,-4611686018427387904,0,4611686018427387904*.
>>>>Note, remove the *allocate_tokens_for_keyspace* setting from the
>>>>*cassandra.yaml* file for this (seed) node.
>>>>5. Check to make sure that no other node in the cluster is assigned
>>>>any of the four tokens specified above. If there is another node in the
>>>>cluster that is assigned one of the above tokens, increment the 
>>>> conflicting
>>>>token by values of one until no other node in the cluster is assigned 
>>>> that
>>>>token value. The idea is to make sure that these four tokens are unique 
>>>> to
>>>>the node.
>>>>6. Add the seed node to cluster. Make sure it is listed in *dcNew *by
>>>>checking nodetool status.
>>>>7. Crea

Re: Uneven token distribution with allocate_tokens_for_keyspace

2019-12-04 Thread Anthony Grasso

Hi Enrico,

Glad to hear the problem has been resolved and thank you for the feedback!

Kind regards,
Anthony

On Mon, 2 Dec 2019 at 22:03, Enrico Cavallin 
wrote:

> Hi Anthony,
> thank you for your hints, now the new DC is well balanced within 2%.
> I did read your article, but I thought it was needed only for new
> "clusters", not also for new "DCs"; but RF is per DC so it makes sense.
>
> You TLP guys are doing a great job for Cassandra community.
>
> Thank you,
> Enrico
>
>
> On Fri, 29 Nov 2019 at 05:09, Anthony Grasso 
> wrote:
>
>> Hi Enrico,
>>
>> This is a classic chicken and egg problem with the
>> allocate_tokens_for_keyspace setting.
>>
>> The allocate_tokens_for_keyspace setting uses the replication factor of
>> a DC keyspace to calculate the token allocation when a node is added to the
>> cluster for the first time.
>>
>> Nodes need to be added to the new DC before we can replicate the keyspace
>> over to it. Herein lies the problem. We are unable to use
>> allocate_tokens_for_keyspace unless the keyspace is replicated to the
>> new DC. In addition, as soon as you change the keyspace replication to the
>> new DC, new data will start to be written to it. To work around this issue
>> you will need to do the following.
>>
>>1. Decommission all the nodes in the *dcNew*, one at a time.
>>2. Once all the *dcNew* nodes are decommissioned, wipe the contents
>>in the *commitlog*, *data*, *saved_caches*, and *hints* directories
>>of these nodes.
>>3. Make the first node to add into the *dcNew* a seed node. Set the
>>seed list of the first node with its IP address and the IP addresses of 
>> the
>>other seed nodes in the cluster.
>>4. Set the *initial_token* setting for the first node. You can
>>calculate the values using the algorithm in my blog post:
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.
>>For convenience I have calculated them:
>>*-9223372036854775808,-4611686018427387904,0,4611686018427387904*.
>>Note, remove the *allocate_tokens_for_keyspace* setting from the
>>*cassandra.yaml* file for this (seed) node.
>>5. Check to make sure that no other node in the cluster is assigned
>>any of the four tokens specified above. If there is another node in the
>>cluster that is assigned one of the above tokens, increment the 
>> conflicting
>>token by values of one until no other node in the cluster is assigned that
>>token value. The idea is to make sure that these four tokens are unique to
>>the node.
>>6. Add the seed node to cluster. Make sure it is listed in *dcNew *by
>>checking nodetool status.
>>7. Create a dummy keyspace in *dcNew* that has a replication factor
>>of 2.
>>8. Set the *allocate_tokens_for_keyspace* value to be the name of the
>>dummy keyspace for the other two nodes you want to add to *dcNew*.
>>Note remove the *initial_token* setting for these other nodes.
>>9. Set *auto_bootstrap* to *false* for the other two nodes you want
>>to add to *dcNew*.
>>10. Add the other two nodes to the cluster, one at a time.
>>11. If you are happy with the distribution, copy the data to *dcNew*
>>by running a rebuild.
>>
>>
>> Hope this helps.
>>
>> Regards,
>> Anthony
>>
>> On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin 
>> wrote:
>>
>>> Hi all,
>>> I have an old datacenter with 4 nodes and 256 tokens each.
>>> I am now starting a new datacenter with 3 nodes and num_token=4
>>> and allocate_tokens_for_keyspace=myBiggestKeyspace in each node.
>>> Both DCs run Cassandra 3.11.x.
>>>
>>> myBiggestKeyspace has RF=3 in dcOld and RF=2 in dcNew. Now dcNew is very
>>> unbalanced.
>>> Also keyspaces with RF=2 in both DCs have the same problem.
>>> Did I miss something or even with  allocate_tokens_for_keyspace I have
>>> strong limitations with low num_token?
>>> Any suggestions on how to mitigate it?
>>>
>>> # nodetool status myBiggestKeyspace
>>> Datacenter: dcOld
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address   Load   Tokens   Owns (effective)  Host ID
>>>   Rack
>>> UN  x.x.x.x  515.83 GiB  256  76.2%
>>> fc462eb2-752f-4d26-aae3-84cb9c977b8a  rack1
>>> UN  x.x.x.x  504.09 GiB  256  72.7%
>>>

Re: Uneven token distribution with allocate_tokens_for_keyspace

2019-11-28 Thread Anthony Grasso

Hi Enrico,

This is a classic chicken and egg problem with the
allocate_tokens_for_keyspace setting.

The allocate_tokens_for_keyspace setting uses the replication factor of a
DC keyspace to calculate the token allocation when a node is added to the
cluster for the first time.

Nodes need to be added to the new DC before we can replicate the keyspace
over to it. Herein lies the problem. We are unable to use
allocate_tokens_for_keyspace unless the keyspace is replicated to the new
DC. In addition, as soon as you change the keyspace replication to the new
DC, new data will start to be written to it. To work around this issue you
will need to do the following.

   1. Decommission all the nodes in the *dcNew*, one at a time.
   2. Once all the *dcNew* nodes are decommissioned, wipe the contents in
   the *commitlog*, *data*, *saved_caches*, and *hints* directories of
   these nodes.
   3. Make the first node to add into the *dcNew* a seed node. Set the seed
   list of the first node with its IP address and the IP addresses of the
   other seed nodes in the cluster.
   4. Set the *initial_token* setting for the first node. You can calculate
   the values using the algorithm in my blog post:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.
   For convenience I have calculated them:
   *-9223372036854775808,-4611686018427387904,0,4611686018427387904*.
   Note, remove the *allocate_tokens_for_keyspace* setting from the
   *cassandra.yaml* file for this (seed) node.
   5. Check to make sure that no other node in the cluster is assigned any
   of the four tokens specified above. If there is another node in the cluster
   that is assigned one of the above tokens, increment the conflicting token
   by values of one until no other node in the cluster is assigned that token
   value. The idea is to make sure that these four tokens are unique to the
   node.
   6. Add the seed node to cluster. Make sure it is listed in *dcNew *by
   checking nodetool status.
   7. Create a dummy keyspace in *dcNew* that has a replication factor of 2.
   8. Set the *allocate_tokens_for_keyspace* value to be the name of the
   dummy keyspace for the other two nodes you want to add to *dcNew*. Note
   remove the *initial_token* setting for these other nodes.
   9. Set *auto_bootstrap* to *false* for the other two nodes you want to
   add to *dcNew*.
   10. Add the other two nodes to the cluster, one at a time.
   11. If you are happy with the distribution, copy the data to *dcNew* by
   running a rebuild.

Hope this helps.

Regards,
Anthony

On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin 
wrote:

> Hi all,
> I have an old datacenter with 4 nodes and 256 tokens each.
> I am now starting a new datacenter with 3 nodes and num_token=4
> and allocate_tokens_for_keyspace=myBiggestKeyspace in each node.
> Both DCs run Cassandra 3.11.x.
>
> myBiggestKeyspace has RF=3 in dcOld and RF=2 in dcNew. Now dcNew is very
> unbalanced.
> Also keyspaces with RF=2 in both DCs have the same problem.
> Did I miss something or even with  allocate_tokens_for_keyspace I have
> strong limitations with low num_token?
> Any suggestions on how to mitigate it?
>
> # nodetool status myBiggestKeyspace
> Datacenter: dcOld
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UN  x.x.x.x  515.83 GiB  256  76.2%
> fc462eb2-752f-4d26-aae3-84cb9c977b8a  rack1
> UN  x.x.x.x  504.09 GiB  256  72.7%
> d7af8685-ba95-4854-a220-bc52dc242e9c  rack1
> UN  x.x.x.x  507.50 GiB  256  74.6%
> b3a4d3d1-e87d-468b-a7d9-3c104e219536  rack1
> UN  x.x.x.x  490.81 GiB  256  76.5%
> 41e80c5b-e4e3-46f6-a16f-c784c0132dbc  rack1
>
> Datacenter: dcNew
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
>  Rack
> UN  x.x.x.x   145.47 KiB  456.3%
> 7d089351-077f-4c36-a2f5-007682f9c215  rack1
> UN  x.x.x.x   122.51 KiB  455.5%
> 625dafcb-0822-4c8b-8551-5350c528907a  rack1
> UN  x.x.x.x   127.53 KiB  488.2%
> c64c0ce4-2f85-4323-b0ba-71d70b8e6fbf  rack1
>
> Thanks,
> -- ec
>

Re: datacorruption with cassandra 2.1.11

2019-05-16 Thread Anthony Grasso

Did you roll back to OpenJDK 1.7u181 or did you upgrade to a more recent
version?

On Thu, 16 May 2019 at 13:43, keshava  wrote:

> The java version that we were using and which turns out to be causing this
> issue was OpenJdk 1.7 u191
>
> On 16-May-2019 06:02, "sankalp kohli"  wrote:
>
>> which exact version you saw this?
>>
>> On Wed, May 15, 2019 at 12:03 PM keshava 
>> wrote:
>>
>>> I gave a try with changing java version , and it worked. seems to be
>>> some issue with java version of choice.
>>>
>>> On 10-May-2019 14:48, "keshava"  wrote:
>>>
 i will try with changing java version.
 w.r.t other point about hardware, i have this issue in multiple setups.
 so i really doubt if hardware is playing spoilsport here

 On 10-May-2019 11:38, "Jeff Jirsa"  wrote:

> It’s going to be very difficult to diagnose remotely.
>
> I don’t run or have an opinion on jdk7 but I would suspect the
> following:
>
> - bad hardware (dimm, disk, network card,  motherboard, processor in
> the order)
> - bad jdk7. I’d be inclined to upgrade to 8 personally, but rolling
> back to previous version may not be a bad idea
>
>
> You’re in a tough spot if this is spreading. I’d personally be looking
> to try to isolate the source and roll forward or backward as quickly as
> possible. I don’t really suspect a cassandra 2.1 but here but it’s 
> possible
> I suppose. Take a snapshot now as you may need it to try to recover data
> later.
>
>
> --
> Jeff Jirsa
>
>
> On May 9, 2019, at 10:53 PM, keshava 
> wrote:
>
> yes we do have compression enabled using 
> "org.apache.cassandra.io.compress.LZ4Compressor"
> it is spreading..
> as the no of inserts increases it is spreading across.
> yes it did started with JDK and OS upgrade.
>
> Best regards  :)
> keshava Hosahalli
>
>
> On Thu, May 9, 2019 at 7:11 PM Jeff Jirsa  wrote:
>
>> Do you have compression enabled on your table?
>>
>> Did this start with the JDK upgrade?
>>
>> Is the compression spreading, or is it contained to the same % of
>> entries?
>>
>>
>>
>> On Thu, May 9, 2019 at 4:12 AM keshava 
>> wrote:
>>
>>> Hi , our application is running in to data corruption issue.
>>> Application uses cassandra 2.1.11 with datastax java driver version 
>>> 2.1.9.
>>> So far all working fine. recently we changed our deployment environment 
>>> to
>>> openjdk 1.7.191 (earlier it was 1.7.181) and centos 7.4 (earlier 6.8) 
>>> This
>>> is randomly happening for one table. 1 in every 4-5 entries are getting
>>> corrupted writing new entries will return success and when i try to 
>>> read i
>>> get data  not found .whenist all the data available in the table using
>>> cqlsh i see garbage entries like
>>>
>>>
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>>
>>> here is the output of the cqlsh
>>>
>>> cqlsh:ccp> select id from socialcontact;
>>>
>>> id
>>> -->
>>>
>>> 9BA31AE3116A097C3F57FEF9 9BA10FB2116A00103F57FEF9
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> 9BA3236C116A09E63F57FEF9 9BA32536116A09FC3F57FEF9
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00.
>>>
>>> I did enabled the query tracing on both cassandra server and driver.
>>> didn't noticed any differences. looking for any advice's in resolving 
>>> this
>>> issue
>>>
>>> PS i did tried upgrading cassandra to latest in 2.1 train but it
>>> didn't help
>>>
>>>

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso

Hi

If you are planning on setting up a new cluster with
allocate_tokens_for_keyspace, then yes, you will need one seed node per
rack. As Jon mentioned in a previous email, you must manually specify the
token range for *each* seed node. This can be done using the initial_token
setting.

The article you are referring to (
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html)
includes python code which calculates the token ranges for each of the seed
nodes. When calling that python code, you must specify the vnodes - number
of token per node and the number of racks.

Regards,
Anthony

On Sat, 4 May 2019 at 19:14, onmstester onmstester
 wrote:

> I just read this article by tlp:
>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> Noticed that:
> >>We will need to set the tokens for the seed nodes in each rack
> manually. This is to prevent each node from randomly calculating its own
> token ranges
>
>  But until now, i was using this recommendation to setup a new cluster:
> >>
>
> You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 4) 
> * i) - 2**63) for i in range(4)])'
>
>
> After you fire up the first seed, create a keyspace using RF=3 (or whatever 
> you're planning on using) and set allocate_tokens_for_keyspace to that 
> keyspace in your config, and join the rest of the nodes. That gives even
> distribution.
>
> I've defined plenty of racks in my cluster (and only 3 seed nodes), should
> i have a seed node per rack and use initial_token for all of the seed nodes
> or just one seed node with inital_token would be ok?
> Best Regards
>
>
>

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso

Good idea Jeff. I can add that in if you like? Do we have a ticket for it
or should I just raise one?

On Mon, 6 May 2019 at 03:49, Jeff Jirsa  wrote:

> Picking an ideal allocation for N seed nodes and M vnodes per seed is
> probably something we should add as a little python script or similar in
> /tools/ to make this easier. Then let the auto allocation stuff kick in
> after that.
>
>
> > On May 5, 2019, at 8:23 AM, Jon Haddad  wrote:
> >
> > I mean you'd want to set up the initial tokens for the first 3 nodes
> > of your cluster, which are usually the seed nodes.
> >
> >
> > On Sat, May 4, 2019 at 8:31 PM onmstester onmstester
> >  wrote:
> >>
> >> So do you mean setting tokens for only one node (one of the seed node)
> is fair enough?
> >> I can not see any problem with this mechanism (only one manual token
> assignment at cluster set up), but the article was also trying to set up a
> balanced cluster and the way that it insist on doing manual token
> assignment for multiple seed nodes, confused me.
> >>
> >> Sent using Zoho Mail
> >>
> >>
> >>
> >>  Forwarded message 
> >> From: Jon Haddad 
> >> To: 
> >> Date: Sat, 04 May 2019 22:10:39 +0430
> >> Subject: Re: How to set up a cluster with allocate_tokens_for_keyspace?
> >>  Forwarded message 
> >>
> >> That line is only relevant for when you're starting your cluster and
> >> you need to define your initial tokens in a non-random way. Random
> >> token distribution doesn't work very well when you only use 4 tokens.
> >>
> >> Once you get the cluster set up you don't need to specify tokens
> >> anymore, you can just use allocate_tokens_for_keyspace.
> >>
> >> On Sat, May 4, 2019 at 2:14 AM onmstester onmstester
> >>  wrote:
> >>>
> >>> I just read this article by tlp:
> >>>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >>>
> >>> Noticed that:
> > We will need to set the tokens for the seed nodes in each rack
> manually. This is to prevent each node from randomly calculating its own
> token ranges
> >>>
> >>> But until now, i was using this recommendation to setup a new cluster:
> >
> >>>
> >>> You'll want to set them explicitly using: python -c 'print(
> [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
> >>>
> >>>
> >>> After you fire up the first seed, create a keyspace using RF=3 (or
> whatever you're planning on using) and set allocate_tokens_for_keyspace to
> that keyspace in your config, and join the rest of the nodes. That gives
> even
> >>> distribution.
> >>>
> >>> I've defined plenty of racks in my cluster (and only 3 seed nodes),
> should i have a seed node per rack and use initial_token for all of the
> seed nodes or just one seed node with inital_token would be ok?
> >>>
> >>> Best Regards
> >>>
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >>
> >>
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-05-05 Thread Anthony Grasso

Hi Jean,

Good question. I think that sentence is slightly confusing and here is why:

If the cluster has tokens are already evenly distributed and there is no
plans to expand the cluster, then applying the allocate_tokens_for_keyspace
setting has no real practical value.

If the cluster has tokens that are unevenly distributed and there are plans
to expand the cluster, then it may be worth using the
allocate_tokens_for_keyspace setting when adding a new node to the cluster.

Looking back on that sentence, I think it should probably read:

*"However, therein lies the problem, for existing clusters using this
> setting is easy, as a keyspace already exists"*


If you think that wording gives better clarification, I'll go back and
update the post when I have time. Let me know what you think.

Regards,
Anthony

On Mon, 29 Apr 2019 at 18:45, Jean Carlo  wrote:

> Hello Anthony,
>
> Effectively I did not start the seed of every rack firsts. Thank you for
> the post. I believe this is something important to have as official
> documentation in cassandra.apache.org. This issues as many others are not
> documented properly.
>
> Of course I find the blog of last pickle very useful in this matters, but
> having a properly documentation of how to start a fresh new cluster
> cassandra is basic.
>
> I have one question about your post, when you mention
> "*However, therein lies the problem, for existing clusters updating this
> setting is easy, as a keyspace already exists*"
> What is the interest to use allocate_tokens_for_keyspace in a cluster
> with data if there tokens are already distributed? in the worst case
> scenario, the cluster is already unbalanced
>
>
> Cheers
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso 
> wrote:
>
>> Hi Jean,
>>
>> It sounds like there are no nodes in one of the racks for the eu-west-3
>> datacenter. What does the output of nodetool status look like currently?
>>
>> Note, you will need to start a node in each rack before creating the
>> keyspace. I wrote a blog post with the procedure to set up a new cluster
>> using the predictive token allocation algorithm:
>> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>> Regards,
>> Anthony
>>
>> On Fri, 26 Apr 2019 at 19:53, Jean Carlo 
>> wrote:
>>
>>> Creating a fresh new cluster in aws using this procedure, I got this
>>> problem once I am bootstrapping the second rack of the cluster of 6
>>> machines with 3 racks and a keyspace of rf 3
>>>
>>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>>> 7660890915606146375, -5329427405842523680]
>>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>>> replication factor 3.
>>>
>>> Someone got this problem ?
>>>
>>> I am not quite sure why I have this, since my cluster has 3 racks.
>>>
>>> Cluster Information:
>>> Name: test
>>> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>>> DynamicEndPointSnitch: enabled
>>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>> Schema versions:
>>> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>>
>>>
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>>
>>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
>>> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> What about adding new keyspaces in the existing cluster, test_2 with
>>>> the same RF.
>>>>
>>>> It will use the same logic as the existing kesypace test ? Or I should
>>>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>>>
>>>> Thanks.
>>>>
>>>> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
>>>> écrit :
>>>>
>>>>> Hi,
>>>>>
>>>>> Managing `initial_token` by yourself will give you more control over
>>>>> scale-in and scale-out.
>>>>> Let's s

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-28 Thread Anthony Grasso

Hi Jean,

It sounds like there are no nodes in one of the racks for the eu-west-3
datacenter. What does the output of nodetool status look like currently?

Note, you will need to start a node in each rack before creating the
keyspace. I wrote a blog post with the procedure to set up a new cluster
using the predictive token allocation algorithm:
http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

Regards,
Anthony

On Fri, 26 Apr 2019 at 19:53, Jean Carlo  wrote:

> Creating a fresh new cluster in aws using this procedure, I got this
> problem once I am bootstrapping the second rack of the cluster of 6
> machines with 3 racks and a keyspace of rf 3
>
> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
> 3265006217757525070, 5054577454645148534, 314677103601736696,
> 7660890915606146375, -5329427405842523680]
> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
> replication factor 3.
>
> Someone got this problem ?
>
> I am not quite sure why I have this, since my cluster has 3 racks.
>
> Cluster Information:
> Name: test
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: enabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>
>
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
> wrote:
>
>> Hi folks,
>>
>> What about adding new keyspaces in the existing cluster, test_2 with the
>> same RF.
>>
>> It will use the same logic as the existing kesypace test ? Or I should
>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>
>> Thanks.
>>
>> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
>> écrit :
>>
>>> Hi,
>>>
>>> Managing `initial_token` by yourself will give you more control over
>>> scale-in and scale-out.
>>> Let's say you have three node cluster with `num_token: 1`
>>>
>>> And your initial range looks like:-
>>>
>>> Datacenter: datacenter1
>>> ==
>>> AddressRackStatus State   LoadOwns
>>>  Token
>>>
>>>  3074457345618258602
>>>
>>> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>>>  -9223372036854775808
>>> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>>>  -3074457345618258603
>>> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>>>  3074457345618258602
>>>
>>> Now let's say you want to scale out the cluster to twice the current
>>> throughput(means you are adding 3 more nodes)
>>>
>>> If you are using AWS EBS volumes then you can use the same volumes and
>>> spin three more nodes by selecting midpoints of existing ranges which means
>>> your new nodes are already having data.
>>> Once you have mounted volumes on your new nodes:-
>>> * You need to delete every system table except schema related tables.
>>> * You need to generate system/local table by yourself which has
>>> `Bootstrap state` as completed and schema-version same as other existing
>>> nodes.
>>> * You need to remove extra data on all the machines using cleanup
>>> commands
>>>
>>> This is how you can scale out Cassandra cluster in the minutes. In case
>>> you want to add nodes one by one then you need to write some small tool
>>> which will always figure out the bigger range in the existing cluster and
>>> will split it into the half.
>>>
>>> However, I never tested it thoroughly but this should work conceptually.
>>> So here we are taking advantage of the fact that we have volumes(data) for
>>> the new node beforehand so we no need to bootstrap them.
>>>
>>> Thanks & Regards,
>>> Varun Barala
>>>
>>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <
>>> onmstes...@zoho.com> wrote:
>>>


 Sent using Zoho Mail 


  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
 >* wrote 

 Hello again :),

 I thought a little bit more about this question, and I was actually
 wondering if something like this would work:

 Imagine 3 node cluster, and create them using:
 For the 3 nodes: `num_token: 4`
 Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
 4611686018427387901`
 Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
 1537228672809129299, 6148914691236517202`
 Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
 3074457345618258600, 7686143364045646503`

  If you know the initial size of your cluster, you can calculate the
 total number of tokens: number of nodes * vnodes and use the

Re: Cassandra 2.1.18 - NPE during startup

2019-04-12 Thread Anthony Grasso

Hi Thomas,

The process you suggested to get around the issue should work with the
system.keyspaces table.

Make sure to backup the original *system.keyspaces* table files on the node
that fails to start. Then, copy only the *system.keyspaces *table files
from a working node into the *system/schema_keyspaces-...* folder of the
node that fails to start.

This method will only work for certain system tables as some of the data
stored in the system tables will differ between nodes.

Regards,
Anthony

On Thu, 28 Mar 2019 at 05:54, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> any ideas regarding below, cause it happened again on a different node.
>
>
>
> Thanks
>
> Thomas
>
>
>
> *From:* Steinmaurer, Thomas 
> *Sent:* Dienstag, 05. Februar 2019 23:03
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra 2.1.18 - NPE during startup
>
>
>
> Hello,
>
>
>
> at a particular customer location, we are seeing the following NPE during
> startup with Cassandra 2.1.18.
>
>
>
> INFO  [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475
> - Opening
> /var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-130
> (256 bytes)
>
> ERROR [main] 2019-02-03 13:32:56,552 CassandraDaemon.java:583 - Exception
> encountered during startup
>
> org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
>
> at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
> ~[apache-cassandra-2.1.18.jar:2.1.18]
>
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310)
> [apache-cassandra-2.1.18.jar:2.1.18]
>
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
> [apache-cassandra-2.1.18.jar:2.1.18]
>
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
> [apache-cassandra-2.1.18.jar:2.1.18]
>
> Caused by: java.lang.NullPointerException: null
>
> at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:664)
> ~[apache-cassandra-2.1.18.jar:2.1.18]
>
> ... 3 common frames omitted
>
>
>
> I found https://issues.apache.org/jira/browse/CASSANDRA-10501
> ,
> but this should be fixed in 2.1.18.
>
>
>
> Is the above log stating that it is caused by a system keyspace related
> SSTable?
>
>
>
> This is a 3 node setup with 2 others running fine. If system table related
> and as LocalStrategy is used as replication strategy (to my knowledge),
> perhaps simply copying over data for the schema_keyspaces table from
> another node might fix it?
>
>
>
> Any help appreciated.
>
>
>
> Thanks.
>
> Thomas
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>

Re: Topology settings before/after decommission node

2019-04-10 Thread Anthony Grasso

Hi Robert,

Your action plan looks good.

You can think of the *cassandra-topology.properties* file as a map for the
cluster. The map between the nodes must be consistent because each node
uses it to determine where it is meant to be located logically.

It is good hygiene to maintain the *cassandra-topology.properties* so it
contains only the IPs (broadcast addresses) currently used in the cluster.
Technically, you cloud leave the entry for the decommissioned node in
there. The problem is later on if that IP address is used by a node it will
be placed back in DC1 and this could be the wrong logical placement for it.
So I would advise removing the address if the node is inactive.

For step 3, there is no need to restart all the nodes. However if you do
want them to reload the configuration you will need to perform a rolling
restart on the cluster (i.e. restart one node at a time).

Regards,
Anthony

On Thu, 11 Apr 2019 at 03:38, rastrent 
wrote:

> Hi there,
>
> I am running a cassandra cluster (v3.0.9) with 2 DCs (4/3 nodes
> respectively) using endpoint_snitch: PropertyFileSnitch and I would like to
> decommission one node in DC1 but I wonder about what kind of actions I need
> to take related with the the topology settings.
> My cassandra-topology.properties has the those simple settings below:
>
> x.x.x.x=DC1:RAC1
> x.x.x.x=DC1:RAC1
> x.x.x.x=DC1:RAC1
> x.x.x.x=DC1:RAC1
> x.x.x.x=DC2:RAC1
> x.x.x.x=DC2:RAC1
> x.x.x.x=DC2:RAC1
>
> default=DC1:r1
>
> My action plan is to:
>
> 1)Decomission a node in DC1
> 2) After node leaves cluster,  edit cassandra-topology.properties in every
> node in the cluster
> 3) Question: No I need to restart all nodes in cluster? (one each time of
> course)
>
> Bonus question: Do I need to change the cassandra-topology.properties
> before move/remove nodes?
>
> Cheers,
>
> Robert,
>
>
> Sent with ProtonMail  Secure Email.
>
>

Re: All time blocked in nodetool tpstats

2019-04-10 Thread Anthony Grasso

Hi Abdul,

Usually we get no noticeable improvement at tuning concurrent_reads and
concurrent_writes above 128. I generally try to keep current_reads to no
higher than 64 and concurrent_writes to no higher than 128. In creasing the
values beyond that you might start running into issues where the kernel IO
scheduler and/or the disk become saturated. As Paul mentioned, it will
depend on the size of your nodes though.

If the client is timing out, it is likely that the node that is selected as
the coordinator for the read has a resource contention somewhere. The root
cause is usually due to a number of things going on though. As Paul
mentioned, one of the issues could be the query design. It is worth
investigating if a particular read query is timing out.

I would also inspect the Cassandra logs and garbage collection logs on the
node where you are seeing the timeouts. The things to look out for is high
garbage collection frequency, long garbage collection pauses, and high
tombstone read warnings.

Regards,
Anthony

On Thu, 11 Apr 2019 at 06:01, Abdul Patel  wrote:

> Yes the queries are all select queries as they are more of read intensive
> app.
> Last night i rebooted cluster and today they are fine .(i know its
> temporary) as i still is all time blocked values.
> I am thinking of incresiing concurrent
>
> On Wednesday, April 10, 2019, Paul Chandler  wrote:
>
>> Hi Abdul,
>>
>> When I have seen dropped messages, I normally double check to ensure the
>> node not CPU bound.
>>
>> If you have a high CPU idle value, then it is likely that tuning the
>> thread counts will help.
>>
>> I normally start with concurrent_reads and concurrent_writes, so in your
>> case as reads are being dropped then increase concurrent_reads, I normally
>> change it to 96 to start with, but it will depend on size of your nodes.
>>
>> Otherwise it might be badly designed queries, have you investigated which
>> queries are producing the client timeouts?
>>
>> Regards
>>
>> Paul Chandler
>>
>>
>>
>> > On 9 Apr 2019, at 18:58, Abdul Patel  wrote:
>> >
>> > Hi,
>> >
>> > My nodetool tpstats arw showing all time blocked high numbers a d also
>> read dropped messages as 400 .
>> > Client is expeirince high timeouts.
>> > Checked few online forums they recommend to increase,
>> native_transport_max_threads.
>> > As of jow its commented with 128 ..
>> > Is it adviabke to increase this and also can this fix timeout issue?
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Assassinate fails

2019-04-03 Thread Anthony Grasso

Hi Alex,

We wrote a blog post on this topic late last year:
http://thelastpickle.com/blog/2018/09/18/assassinate.html.

In short, you will need to run the assassinate command on each node
simultaneously a number of times in quick succession. This will generate a
number of messages requesting all nodes completely forget there used to be
an entry within the gossip state for the given IP address.

Regards,
Anthony

On Thu, 4 Apr 2019 at 03:32, Alex  wrote:

> Same result it seems:
> Welcome to JMX terminal. Type "help" for available commands.
> $>open localhost:7199
> #Connection to localhost:7199 is opened
> $>bean org.apache.cassandra.net:type=Gossiper
> #bean is set to org.apache.cassandra.net:type=Gossiper
> $>run unsafeAssassinateEndpoint 192.168.1.18
> #calling operation unsafeAssassinateEndpoint of mbean
> org.apache.cassandra.net:type=Gossiper
> #RuntimeMBeanException: java.lang.NullPointerException
>
>
> There not much more to see in log files :
> WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626
> Gossiper.java:575 - Assassinating /192.168.1.18 via gossip
> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627
> Gossiper.java:585 - Sleeping for 3ms to ensure /192.168.1.18 does
> not change
> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628
> Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN
> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631
> StorageService.java:2324 - Removing tokens [..] for /192.168.1.18
>
>
>
>
> Le 03.04.2019 17:10, Nick Hatfield a écrit :
> > Run assassinate the old way. I works very well...
> >
> > wget -q -O jmxterm.jar
> >
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
> >
> > java -jar ./jmxterm.jar
> >
> > $>open localhost:7199
> >
> > $>bean org.apache.cassandra.net:type=Gossiper
> >
> > $>run unsafeAssassinateEndpoint 192.168.1.18
> >
> > $>quit
> >
> >
> > Happy deleting
> >
> > -Original Message-
> > From: Alex [mailto:m...@aca-o.com]
> > Sent: Wednesday, April 03, 2019 10:42 AM
> > To: user@cassandra.apache.org
> > Subject: Assassinate fails
> >
> > Hello,
> >
> > Short story:
> > - I had to replace a dead node in my cluster
> > - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
> > - dead node has null host_id
> > - assassinate on dead node fails with error
> >
> > How can I get rid of this dead node ?
> >
> >
> > Long story:
> > I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
> > a new node from scratch and "replaced" the dead node using the
> > information from this page
> >
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html
> .
> > It looked like the replacement went ok.
> >
> > I added two more nodes to strengthen the cluster.
> >
> > A few days have passed and the dead node is still visible and marked
> > as "down" on 3 of 5 nodes in nodetool status:
> >
> > --  Address   Load   Tokens   Owns (effective)  Host ID
> >   Rack
> > UN  192.168.1.9   16 GiB 256  35.0%
> > 76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
> > UN  192.168.1.12  16.09 GiB  256  34.0%
> > 719601e2-54a6-440e-a379-c9cf2dc20564  rack1
> > UN  192.168.1.14  14.16 GiB  256  32.6%
> > d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
> > UN  192.168.1.17  15.4 GiB   256  34.1%
> > fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
> > DN  192.168.1.18  24.3 GiB   256  33.7% null
> >   rack1
> > UN  192.168.1.22  19.06 GiB  256  30.7%
> > 09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1
> >
> > Its host ID is null, so I cannot use nodetool removenode. Moreover
> > nodetool assassinate 192.168.1.18 fails with :
> >
> > error: null
> > -- StackTrace --
> > java.lang.NullPointerException
> >
> > And in system.log:
> >
> > INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
> > Gossiper.java:585 - Sleeping for 3ms to ensure /192.168.1.18 does
> > not change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
> > AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO
> >  [IndexSummaryManager:1] 2019-03-27 17:40:03,620
> > IndexSummaryRedistribution.java:75 - Redistributing index summaries
> > INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
> > Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO  [RMI
> > TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
> > StorageService.java:2324 - Removing tokens [-1061369577393671924,...]
> > ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226
> > - Exception in thread Thread[GossipStage:1,5,main]
> > java.lang.NullPointerException: null
> >
> >
> > In system.peers, the dead node shows and has the same ID as the
> > replacing node :
> >
> > cqlsh> select peer, host_id from system.peers;
> >
> >   peer | host_id
> > --+--
> >   192.168.1.18

Re: Bind keyspace to specific data directory

2018-07-16 Thread Anthony Grasso

Hi Abdul,

There is no mechanism offered in Cassandra to bind a keyspace (when
created) to specific filesystem or directory. If multiple filesystems or
directories are specified in the data_file_directories property in the
*cassandra.yaml* then Cassandra will attempt to evenly distribute data from
all keyspaces across them.

Cassandra places table directories for each keyspace in a folder under the
path(s) specified in the data_file_directories property. That is, if the
data_file_directories property was set to */var/lib/cassandra/data* and
keyspace "foo" was created, Cassandra would create the directory
*/var/lib/cassandra/data/foo*.

One possible way bind a keyspace to a particular file system is create a
custom mount point that has the same path as the keyspace. For example if
you had a particular volume that you wanted to use for keyspace "foo", you
could do something like:

sudo mount / /var/lib/cassandra/data/foo

Note that you would probably need to do this after the keyspace is created
and before the tables are created. This setup would mean that all
reads/writes for tables in keyspace "foo" would touch that volume.

Regards,
Anthony

On Tue, 3 Jul 2018 at 07:02, Abdul Patel  wrote:

> Hi
>
> Can we bind or specify while creating keyspace to bind to specific
> filesystem or directory for writing?
> I see we can split data on multiple filesystems but can we decide while
> fileystem a particular keyspace can read and write?
>

Re: Check Cluster Health

2018-07-04 Thread Anthony Grasso

Hi,

Yes, you can use nodetool status to inspect the health/status of the
cluster. Using *nodetool status * will show the cluster
health/status as well as the amount of data that each node has for the
specified **.  Using *nodetool status* without the 
argument will only show the cluster health/status.

Unless there is a special reason for using nodetool to capture history, you
may want to consider using metric libraries to capture and push information
about each node to a metric server. It is much easier to view the data
captured on the metric server as there are tools already made for this.
Using metrics libraries will save you time creating and maintaining a
parser for the nodetool output. It also makes monitoring the health of
cluster very easy.

Regards,
Anthony

On Sun, 1 Jul 2018 at 20:19, Thouraya TH  wrote:

> Hi,
> Thank you so much for answer.
> Please, is it possible to use this command ?
>
> nodetool status mykeyspace
>
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  127.0.0.1  47.66 KB   1   33.3%   
> aaa1b7c1-6049-4a08-ad3e-3697a0e30e10  rack1
> UN  127.0.0.2  47.67 KB   1   33.3%   
> 1848c369-4306-4874-afdf-5c1e95b8732e  rack1
> UN
>
> Thank you so much.
> Kind regards.
>
> 2018-06-29 1:40 GMT+01:00 Rahul Singh :
>
>>
>>
>> When you run TPstats or Tablestats subcommands in nodetool you are
>> actually accessing data inside Cassandra via JMX.
>>
>> You can start there at first.
>>
>> Rahul
>> On Jun 28, 2018, 10:55 AM -0500, Thouraya TH ,
>> wrote:
>>
>> Hi,
>>
>> Please, how can check the health of my cluster / data center using
>> cassandra ?
>> In fact i'd like to generate a hitory of the state of each node. an
>> history about the failure of my cluster ( 20% of failure in a day, 40% of
>> failure in a day etc...)
>>
>> Thank you so much.
>> Kind regards.
>>
>>
>

Re: replace dead node vs remove node

2018-03-22 Thread Anthony Grasso

Hi Peng,

Correct, you would want to repair in either case.

Regards,
Anthony


On Fri, 23 Mar 2018 at 14:09, Peng Xiao <2535...@qq.com> wrote:

> Hi Anthony,
>
> there is a problem with replacing dead node as per the blog,if the
> replacement process takes longer than max_hint_window_in_ms,we must run
> repair to make the replaced node consistent again, since it missed ongoing
> writes during bootstrapping.but for a great cluster,repair is a painful
> process.
>
> Thanks,
> Peng Xiao
>
>
>
> ------ 原始邮件 --
> *发件人:* "Anthony Grasso"<anthony.gra...@gmail.com>;
> *发送时间:* 2018年3月22日(星期四) 晚上7:13
> *收件人:* "user"<user@cassandra.apache.org>;
> *主题:* Re: replace dead node vs remove node
>
> Hi Peng,
>
> Depending on the hardware failure you can do one of two things:
>
> 1. If the disks are intact and uncorrupted you could just use the disks
> with the current data on them in the new node. Even if the IP address
> changes for the new node that is fine. In that case all you need to do is
> run repair on the new node. The repair will fix any writes the node missed
> while it was down. This process is similar to the scenario in this blog
> post:
> http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html
>
> 2. If the disks are inaccessible or corrupted, then use the method as
> described in the blogpost you linked to. The operation is similar to
> bootstrapping a new node. There is no need to perform any other remove or
> join operation on the failed or new nodes. As per the blog post, you
> definitely want to run repair on the new node as soon as it joins the
> cluster. In this case here, the data on the failed node is effectively lost
> and replaced with data from other nodes in the cluster.
>
> Hope this helps.
>
> Regards,
> Anthony
>
>
> On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:
>
>> Dear All,
>>
>> when one node failure with hardware errors,it will be in DN status in the
>> cluster.Then if we are not able to handle this error in three hours(max
>> hints window),we will loss data,right?we have to run repair to keep the
>> consistency.
>> And as per
>> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
>> can replace this dead node,is it the same as bootstrap new node?that means
>> we don't need to remove node and rejoin?
>> Could anyone please advise?
>>
>> Thanks,
>> Peng Xiao
>>
>>
>>
>>
>>

Re: replace dead node vs remove node

2018-03-22 Thread Anthony Grasso

Hi Peng,

Depending on the hardware failure you can do one of two things:

1. If the disks are intact and uncorrupted you could just use the disks
with the current data on them in the new node. Even if the IP address
changes for the new node that is fine. In that case all you need to do is
run repair on the new node. The repair will fix any writes the node missed
while it was down. This process is similar to the scenario in this blog
post:
http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html

2. If the disks are inaccessible or corrupted, then use the method as
described in the blogpost you linked to. The operation is similar to
bootstrapping a new node. There is no need to perform any other remove or
join operation on the failed or new nodes. As per the blog post, you
definitely want to run repair on the new node as soon as it joins the
cluster. In this case here, the data on the failed node is effectively lost
and replaced with data from other nodes in the cluster.

Hope this helps.

Regards,
Anthony

On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:

> Dear All,
>
> when one node failure with hardware errors,it will be in DN status in the
> cluster.Then if we are not able to handle this error in three hours(max
> hints window),we will loss data,right?we have to run repair to keep the
> consistency.
> And as per
> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
> can replace this dead node,is it the same as bootstrap new node?that means
> we don't need to remove node and rejoin?
> Could anyone please advise?
>
> Thanks,
> Peng Xiao
>
>
>
>
>

Re: Cassandra vs MySQL

2018-03-14 Thread Anthony Grasso

Hi Oliver,

I was in a similar situation to you and Matija a few years back as well and
can vouch for what Matija has said. Some data sets are more suitable for
Cassandra than others; so the answer to your question depends on the type
of data and how it is modelled in Cassandra. The data model will affect
performance and how the cluster expands over time.

The application(s) connecting to the database will need to be modified to
at least call the cluster and possibly to perform some of the operations
that MySQL performed (e.g joins). This means that any sort of benchmark
would have use the full system end-to-end. If you did decide to benchmark
your system with MySQL and then with Cassandra, it is best to use a full
production data load. This is because the data model used in Cassandra will
affect the system performance characteristics as the data grows.

Kind regards,
Anthony

On Tue, 13 Mar 2018 at 07:29, Matija Gobec  wrote:

> Hi Oliver,
>
> Few years back I had a similar problem where there was a lot of data in
> MySQL and it was starting to choke. I migrated data to Cassandra, ran
> benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
> If you have a use case for Cassandra the answer is yes, but keep in mind
> that there are some use cases like relational problems which can be hard to
> solve with Cassandra and I tend to keep them in relational database. That
> being said, I don't think you can benchmark these two head to head since
> they basically solve different problems and Cassandra is distributed by
> design.
>
> Best,
> Matija
>
> On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth  wrote:
>
>> Hi,
>>
>> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker 
>> wrote:
>>
>>> We have a project currently using MySQL single-node with 5-6TB of data
>>> and some performance issues, and we plan to add data up to a total size of
>>> maybe 25-30TB.
>>>
>>
>> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
>> of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
>> on your use-case and database scheme.
>>
>> Is there some good more recent material?
>>>
>>
>> Are you able to completely redesign your database schema? :)
>>
>> Bye,
>> Gábor Auth
>>
>>
>

Re: command to view yaml file setting in use on console

2018-03-12 Thread Anthony Grasso

Hi Kenneth,

In addition to CASSANDRA-7622, it may help to inspect the Cassandra
*system.log* and look for the following entry:

INFO  [main] ... - Node configuration:[...]

The content of "Node configuration" will have the settings the node is
using.

Regards,
Anthony



On Tue, 13 Mar 2018 at 12:50, Kenneth Brotman 
wrote:

> You say the nicest things!
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Monday, March 12, 2018 6:43 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: command to view yaml file setting in use on console
>
>
>
> Cassandra-7622 went patch available today
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 12, 2018, at 6:40 PM, Kenneth Brotman 
> wrote:
>
> Is there a command, perhaps a nodetool command to view the actual yaml
> settings a node is using so you can confirm it is using the changes to a
> yaml file you made?
>
>
>
> Kenneth Brotman
>
>

Re: Slender Cassandra Cluster Project

2018-01-21 Thread Anthony Grasso

Hi Kenneth,

Fantastic idea!

One thing that came to mind from my reading of the proposed setup was rack
awareness of each node. Given that the proposed setup contains three DCs, I
assume that each node will be made rack aware? If not, consider defining
three racks for each DC and placing two nodes in each rack. This will
ensure that all the nodes in a single rack contain at most one replica of
the data.

Regards,
Anthony

On 17 January 2018 at 11:24, Kenneth Brotman 
wrote:

> Sure.  That takes the project from awesome to 10X awesome.  I absolutely
> would be willing to do that.  Thanks Kurt!
>
>
>
> Regarding your comment on the keyspaces, I agree.  There should be a few
> simple examples one way or the other that can be duplicated and observed,
> and then an example to duplicate and play with that has a nice real world
> mix, with some keyspaces that replicate over only a subset of DC’s and some
> that replicate to all DC’s.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Tuesday, January 16, 2018 1:31 PM
> *To:* User
> *Subject:* Re: Slender Cassandra Cluster Project
>
>
>
> Sounds like a great idea. Probably would be valuable to add to the
> official docs as an example set up if you're willing.
>
>
>
> Only thing I'd add is that you should have keyspaces that replicate over
> only a subset of DC's, plus one/some replicated to all DC's
>
>
>
> On 17 Jan. 2018 03:26, "Kenneth Brotman" 
> wrote:
>
> I’ve begun working on a reference project intended to provide guidance on
> configuring and operating a modest Cassandra cluster of about 18 nodes
> suitable for the economic study, demonstration, experimentation and testing
> of a Cassandra cluster.
>
>
>
> The slender cluster would be designed to be as inexpensive as possible
> while still using real world hardware in order to lower the cost to those
> with limited initial resources. Sorry no Raspberry Pi’s for this project.
>
>
>
> There would be an on-premises version and a cloud version.  Guidance would
> be provided on configuring the cluster, on demonstrating key Cassandra
> behaviors, on files sizes, capacity to use with the Slender Cassandra
> Cluster, and so on.
>
>
>
> Why about eighteen nodes? I tried to figure out what the minimum number of
> nodes needed for Cassandra to be Cassandra is?  Here were my considerations:
>
>
>
> • A user wouldn’t run Cassandra in just one data center; so at
> least two datacenters.
>
> • A user probably would want a third data center available for
> analytics.
>
> • There needs to be enough nodes for enough parallelism to
> observe Cassandra’s distributed nature.
>
> • The cluster should have enough nodes that one gets a sense
> of the need for cluster wide management tools to do things like repairs,
> snapshots and cluster monitoring.
>
> • The cluster should be able to demonstrate a RF=3 with local
> quorum.  If replicated in all three data centers, one write would impact
> half the 18 nodes, 3 datacenters X 3 nodes per data center = 9 nodes of 18
> nodes.  If replicated in two of the data centers, one write would still
> impact one third of the 18 nodes, 2 DC’s X 3 nodes per DC = 6 of the 18
> nodes.
>
>
>
> So eighteen seems like the minimum number of nodes needed.  That’s six
> nodes in each of three data centers.
>
>
>
> Before I get too carried away with this project, I’m looking for some
> feedback on whether this project would indeed be helpful to others? Also,
> should the project be changed in any way?
>
>
>
> It’s always a pleasure to connect with the Cassandra users’ community.
> Thanks for all the hard work, the expertise, the civil dialog.
>
>
>
> Kenneth Brotman
>

Re: [EXTERNAL] Cassandra cluster add new node slowly

2018-01-03 Thread Anthony Grasso

The speed at which compactions operate is also physically restricted by the
speed of the disk. If the disks used on the new node are HDDs, then
increasing the compaction throughput will be of little help. However, if
the disks on the new node are SSDs then increasing the compaction
throughput to at least 64MB/s should help speed up compactions.

Regards,
Anthony

On 4 January 2018 at 14:13, qf zhou  wrote:

> The cassandra version is 3.0.9.
>
> I  have changed the heap size (about  32G). Also, the streaming
> throughput is set 800MB/sec,  and the streaming_socket_timeout_in_ms is
> default 8640.
> I suspect  the  compactionthroughput has an influence on the new node
> joining.   The command  nodetool | getcompactionthroughput says  'Current
> compaction throughput: 32 MB/s’.
>
>
>
>
> 在 2018年1月4日，上午4:59，Durity, Sean R  写道：
>
> You don’t mention the version, but here are some general suggestions
>
> -  2 GB heap is very small for a node, especially with 1 TB+ of
> data. What is the physical RAM on the host? In general, you want ½ of
> physical RAM for the JVM. (Look in jvm.options or cassandra-env.sh)
> -  You can change the streaming throughput from the existing
> nodes, if it looks like the new node can handle it. Look at nodetool
> setstreamthroughput. Default is 200 (MB/sec).
> -  You might want to check for a streaming_socket_timeout_in_ms.
> This has changed over the versions. Some details are at:
> https://issues.apache.org/jira/browse/CASSANDRA-11839. 24 hours is good
> recommendation.
> -  If your new node can’t compact fast enough to keep disk usage
> down, look at compactionthroughput on that node
> -  nodetool netstats | grep –v “100%” is a good way to see what
> is happening/if anything is stuck. Newer versions give a bit more info on
> progress.
> -  Don’t forget to run cleanup on existing nodes after the new
> nodes are added.
>
>
>
> Sean Durity
> *From:* qf zhou [mailto:zhouqf2...@gmail.com ]
> *Sent:* Tuesday, January 02, 2018 10:30 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Cassandra cluster add new node slowly
>
> The cluster has  3 nodes,  and  the data in each node is  about 1.2 T.  I
> want to add two new nodes to expand the cluster.
>
> Following the instructions from the datastax  website, ie,  (
> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/operations/
> opsAddNodeToCluster.html
> ),
>
>
> I try to add one  node  to  the cluster.  However,  it  is  too slow  and
> time cost too  much.  After about  24 hours,  it still didn’t  success.
>
> I run the command: nodetool netstats  on the new node,  it  shows that:
>
> (tb1fullwithstate2  is a big table and 90% of  the cluster data  is  in
> it.   Here I use CompactionStrategy: TimeWindowCompactionStrategy).
>
> /*.*.*.3
> Receiving 136 files, 328573794609 bytes total. Already received 8
> files, 5774621188 bytes total
> tb1/tb1fullwithneweststatetest 3758271/3758271 bytes(100%)
> received from idx:0/*.*.*.3
> system_distributed/repair_history 57534/57534 bytes(100%)
> received from idx:0/*.*.*.3
> system_distributed/parent_repair_history 507660/507660
> bytes(100%) received from idx:0/*.*.*.3
> tb1/tb1_device_last_state_eachday 15754096/15754096
> bytes(100%) received from idx:0/*.*.*.3
> mytest1/tb1_test1 8143775/8143775 bytes(100%) received from
> idx:0/*.*.*.3
> tb1/tb1fullwithstate 2251191007/2251191007 bytes(100%)
> received from idx:0/*.*.*.3
> applocationinfo/weiyirong_app 2760/2760 bytes(100%) received
> from idx:0/*.*.*.3
> tb1/tb1fullwithstate2 3490748006/4909554503 bytes(71%)
> received from idx:0/*.*.*.3
> tb1/tb1fullwithneweststate 4458079/4458079 bytes(100%)
> received from idx:0/*.*.*.3
> /*.*.*.2
> Receiving 136 files, 336762487360 bytes total. Already received 3
> files, 5695770181 bytes total
> system_distributed/repair_history 31684/31684 bytes(100%)
> received from idx:0/*.*.*.2
> tb1/tb1fullwithstate 908260516/908260516 bytes(100%) received
> from idx:0/*.*.*.2
> tb1/tb1fullwithstate2 4783622958/4990450588 bytes(95%)
> received from idx:0/*.*.*.2
> tb1/tb1fullwithneweststate 3855023/3855023 bytes(100%)
> received from idx:0/*.*.*.2
> /*.*.*.4
> Receiving 132 files, 236250553620 bytes total. Already received 10
> files, 3117465128 bytes total
> mytest1/wordstest2 46/46 bytes(100%) received from
> idx:0/*.*.*.4
> tb1/tb1fullwithneweststatetest

Re: Node Failure Scenario

2017-11-13 Thread Anthony Grasso

Hi Anshu,

To add to Erick's comment, remember to remove the *replace_address* method
from the *cassandra-env.sh* file once the node has rejoined successfully.
The node will fail the next restart otherwise.

Alternatively, use the *replace_address_first_boot* method which works
exactly the same way as *replace_address* the only difference is there is
no need to remove it from the *cassandra-env.sh* file.

Kind regards,
Anthony

On 13 November 2017 at 14:59, Erick Ramirez  wrote:

> Use the replace_address method with its own IP address. Make sure you
> delete the contents of the following directories:
> - data/
> - commitlog/
> - saved_caches/
>
> Forget rejoining with repair -- it will just cause more problems. Cheers!
>
> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee 
> wrote:
>
>> Hi All ,
>>
>> There was a node failure in one of production cluster due to disk
>> failure.  After h/w recovery that node is noew ready be part of cluster,
>> but it doesn't has any data due to disk crash.
>>
>>
>>
>> I can think of following option :
>>
>>
>>
>> 1. replace the node with same. using replace_address
>>
>> 2. Set bootstrap=false , start the node and run the repair to stream the
>> data.
>>
>>
>>
>> Please suggest if both option are good and which is  best as per your
>> experience. This is live production cluster.
>>
>>
>> Thanks,
>>
>>
>> --
>> *C*heers,*
>> *Anshu V*
>>
>>
>>
>

Re: Restore cassandra snapshots

2017-10-17 Thread Anthony Grasso

Hi Pradeep,

If you are going to copy N snapshots to N nodes you will need to make sure
you have the System keyspace as part of that snapshot. The System keyspace
that is local to each node, contains the token allocations for that
particular node. This allows the node to work out what data it is
responsible for. Further to that, if you are restoring the System keyspace
from snapshots, make sure that the cluster name of the new cluster is
exactly the same as the cluster which generated the System keyspace
snapshots.

Regards,
Anthony

On 16 October 2017 at 23:28, Jean Carlo  wrote:

> HI,
>
> Yes of course, you can use sstableloader from every sstable to your new
> cluster. Actually this is the common procedure. Just check the log of
> cassandra, you shouldn't see any errors of streaming.
>
>
> However, because the fact you are migrating from on cluster of N nodes to
> another of N nodes, I believe you can just copy and paste your data node
> per node and make a nodetool refresh. Checking obviously the correct names
> of your sstables.
> You can check the tokens of your node using nodetool info -T
>
> But I think sstableloader is the easy way :)
>
>
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Mon, Oct 16, 2017 at 1:55 PM, Pradeep Chhetri 
> wrote:
>
>> Hi Jean,
>>
>> Thank you for the quick response. I am not sure how to achieve that. Can
>> i set the tokens for a node via cqlsh ?
>>
>> I know that i can check the nodetool rings to get the tokens allocated to
>> a node.
>>
>> I was thinking to basically run sstableloader for each of the snapshots
>> and was assuming it will load the complete data properly. Isn't that the
>> case.
>>
>> Thank you.
>>
>> On Mon, Oct 16, 2017 at 5:21 PM, Jean Carlo 
>> wrote:
>>
>>> Hi,
>>>
>>> Be sure that you have the same tokens distribution than your original
>>> cluster. So if you are going to restore from old node 1 to new node 1, make
>>> sure that the new node and the old node have the same tokens.
>>>
>>>
>>> Saludos
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>> On Mon, Oct 16, 2017 at 1:40 PM, Pradeep Chhetri 
>>> wrote:
>>>
 Hi,

 I am trying to restore an empty 3-node cluster with the three snapshots
 taken on another 3-node cluster.

 What is the best approach to achieve it without loosing any data
 present in the snapshot.

 Thank you.
 Pradeep

>>>
>>>
>>
>

Re: Rebalance a cassandra cluster

2017-09-15 Thread Anthony Grasso

As Kurt mentioned, you definitely need to pick a partition key that ensure
data is uniformly distributed.

If you want to want to redistribute the data in cluster and move tokens
around, you could decommission the node with the tokens you want to
redistribute and then bootstrap a new node into the cluster. However, be
careful, because if there are unbalanced partitions in the cluster
redistributing the tokens will just move the problem partition to another
node. In this case, the same problem will occur on the node that picks up
the problem partition key and you will be back in the same situation again.

Regards,
Anthony

On 13 September 2017 at 20:09, kurt greaves  wrote:

> You should choose a partition key that enables you to have a uniform
> distribution of partitions amongst the nodes and refrain from having too
> many wide rows/a small number of wide partitions. If your tokens are
> already uniformly distributed, recalculating in order to achieve a better
> data load balance is probably going to be an effort in futility, plus not
> really a good idea from a maintenance and scaling perspective.
>

Re: Restarting nodes and reported load

2017-05-31 Thread Anthony Grasso

Hi Daniel,

When you say that the nodes have to be restarted, are you just restarting
the Cassandra service or are you restarting the machine?
How are you reclaiming disk space at the moment? Does disk space free up
after the restart?

Regarding storage on nodes, keep in mind the more data stored on a node,
the longer some operations to maintain that data will take to complete. In
addition, the more data that is on each node, the long it will take to
stream data to other nodes. Whether it is replacing a down node or
inserting a new node, having a large amount of data on each node will mean
that it takes longer for a node to join the cluster if it is streaming the
data.

Kind regards,
Anthony

On 30 May 2017 at 02:43, Daniel Steuernol  wrote:

> The cluster is running with RF=3, right now each node is storing about 3-4
> TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB
> of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes
> with 10k iops. I guess this brings up the question of what's a good marker
> to decide on whether to increase disk space vs provisioning a new node?
>
>
>
> On May 29 2017, at 9:35 am, tommaso barbugli  wrote:
>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org
>

Re: How do you do automatic restacking of AWS instance for cassandra?

2017-05-28 Thread Anthony Grasso

Hi Surbhi,

Please see my comment inline below.

On 28 May 2017 at 12:11, Jeff Jirsa  wrote:

>
>
> On 2017-05-27 18:04 (-0700), Surbhi Gupta 
> wrote:
> > Thanks a lot for all of your reply.
> > Our requirement is :
> > Our company releases AMI almost every month where they have some or the
> > other security packages.
> > So as per our security team we need to move our cassandra cluster to the
> > new AMI .
> > As this process happens every month, we would like to automate the
> process .
> > Few points to consider here:
> >
> > 1. We are using ephemeral drives to store cassandra data
> > 2. We are on dse 4.8.x
> >
> > So currently to do the process, we pinup a new nodes with new DC name and
> > join that DC, alter the keyspace, do rebuild  and later alter the
> keyspace
> > again to remove the old DC .
> >
> > But all of this process is manually done as of now.
> >
> > So i wanted to understand , on AWS, how do you do above kind of task
> > automatically ?
>
>
> At a previous employer, they used M4 class instances with data on a
> dedicated EBS volumes, so we could swap AMIs / stop / start / adjust
> instances without having to deal with this. This worked reasonably well for
> their scale (which was petabytes of data).
>

This is a really good option as it avoids streaming data to replace a node
which could potentially be quicker if dealing with large amounts of data on
each node.


>
> Other companies using ephemeral tend to be more willing to just terminate
> instances and replace them (-Dcassandra.replace_address). If you stop
> cassandra, then boot a replacement with 'replace_address' set, it'll take
> over for the stopped instance, including re-streaming all data (as best it
> can, subject to consistency level and repair status). This may be easier
> for you to script than switching your fleet to EBS, but it's not without
> risk.
>

A quick note if you do decide to go down this path. If you are using
Cassandra version 2.x.x and above, the cassandra.replace_address_first_boot
can also be used. This option works once when Cassandra is first started
and the replacement node inserted into the cluster. After that, the option
is ignored for all subsequent restarts, where as
cassandra.replace_address needs
to be removed from the *cassandra-env.sh* file in order to restart the
node. Restart behaviour aside, both options operate in the same way to
replace a node in the cluster.


>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Reg:- Data Modelling Concepts

2017-05-17 Thread Anthony Grasso

Hi Nandan,

If there is a requirement to answer a query "What are the changes to a book
made by a particular user?", then yes the schema you have proposed can
work. To obtain the list of updates for a book by a user from the
*book_title_by_user* table will require the partition key (*book_title*),
the first clustering key (*book_id*), and the second clustering key (
*user_id*).

i.e. SELECT * FROM book_title_by_user WHERE book_title= AND
book_id= AND user_id=;

If the book_id is unnecessary for answering the above query, it may be
worth changing the primary key ordering of the *book_title_by_user* table
to the following.

CREATE TABLE book_title_by_user(
  book_title text,
  book_id uuid,
  user_id uuid ,
  ts timeuuid,
  PRIMARY KEY (book_title, user_id, book_id, ts)
);

This will then simplify the select statement to

SELECT * FROM book_title_by_user WHERE book_title= AND
user_id=;

Kind regards,
Anthony

On 17 May 2017 at 13:05, @Nandan@  wrote:

> Hi Jon,
>
> We need to keep tracking of all updates like 'User' of our platform can
> check what changes made before.
> I am thinking in this way..
> CREATE TABLE book_info (
> book_id uuid,
> book_title text,
> author_name text,
> updated_at timestamp,
> PRIMARY KEY(book_id));
> This table will contain details about all book with unique updated
> details.
> CREATE TABLE book_title_by_user(
> book_title text,
> book_id uuid,
> user_id uuid ,
> ts timeuuid,
> primary key(book_title,book_id,user_id,ts));
> This table wil contain details of multiple old updates of book which can
> be done by mulplie users like MANY TO MANY .
>
> What do you think on this?
>
> On Wed, May 17, 2017 at 9:44 AM, Jonathan Haddad 
> wrote:
>
>> I don't understand why you need to store the old value a second time.  If
>> you know that the value went from A -> B -> C, just store the new value,
>> not the old.  You can see that it changed from A->B->C without storing it
>> twice.
>>
>> On Tue, May 16, 2017 at 6:36 PM @Nandan@ 
>> wrote:
>>
>>> The requirement is to create DB in which we have to keep data of Updated
>>> values as well as which user update the particular book details and what
>>> they update.
>>>
>>> We are like to create a schema which store book info, as well as the
>>> history of the update, made based on book_title, author, publisher, price
>>> changed.
>>> Like we want to store what was old data and what new data updated.. and
>>> also want to check which user updated the relevant change. Because suppose
>>> if some changes not made correctly then they can check changes and revert
>>> based on old values.
>>> We are trying to make a USER based Schema.
>>>
>>> For example:-
>>> id:- 1
>>> Name: - Harry Poter
>>> Author : - JK Rolling
>>>
>>> New Update Done by user_id 2:-
>>> id :- 1
>>> Name:- Harry Pottor
>>> Author:- J.K. Rolls
>>>
>>> Update history also need to store as :-
>>> User_id :- 2
>>> Old Author :- JK Rolling
>>> New Author :- J.K. Rolls
>>>
>>> So I need to update the details of Book which is done by UPSERT. But
>>> also I have to keep details like which user updated and what updated.
>>>
>>>
>>> One thing that helps define the schema is knowing what queries will be
>>> made to the database up front.
>>> Few queries that the database needs to answer.
>>> What are the current details of a book?
>>> What is the most recent update to a particular book?
>>> What are the updates that have been made to a particular book?
>>> What are the details for a particular update?
>>>
>>>
>>> Update frequently will be like Update will happen based on Title, name,
>>> Author, price , publisher like. So not very high frequently.
>>>
>>> Best Regards,
>>> Nandan
>>>
>>
>

Re: Reg:- Data Modelling based on Update History details

2017-05-15 Thread Anthony Grasso

Hi Nandan,

Interesting project!

One thing that helps define the schema is knowing what queries will be made
to the database up front. It sounds like you have an idea already of what
those queries will be. I want to confirm that these are the queries that
the database needs to answer.

   - *What are the current details of a book?*
   - *What is the most recent update to a particular book?*
   - *What are the updates that have been made to a particular book?*
   - *What are the details for a particular update?*

Do the above queries sound correct and will the database need to answer any
other queries?

With regards to the data being stored; how frequently do the books get
updated? What type of details are stored for an update, as in, is it meta
information about the book (author, publish date etc) or is it changes to
the book content? Answers to these questions impact the schema structure.

Kind regards,
Anthony

On 15 May 2017 at 16:36, @Nandan@  wrote:

> Hi ,
> I am currently working on Book Management System in which I have a table
> which contains Books details in which PRIMARY KEY is book_id uuid.
> The requirement is to create DB in which we have to keep data of Updated
> values as well as which user update the particular book details and what
> they update.
>
> For example:-
> id:- 1
> Name: - Harry Poter
> Author : - JK Rolling
>
> New Update Done by user_id 2:-
> id :- 1
> Name:- Harry Pottor
> Author:- J.K. Rolls
>
> So I need to update the details of Book which is done by UPSERT. But also
> I have to keep details like which user updated and what updated.
>
> I hope, I am able to describe my scenario in details. Please suggest on
> above scenario.
>
>

Re: Cassandra Snapshots and directories

2017-05-14 Thread Anthony Grasso

Hi Daniel,

Yes, you are right it does require some additional work to rsync just the
snapshots.

What about doing something like this to make rsync syntax for the backup
easier?

# in the Cassandra data directory, iterate through the keyspaces
for ks in $(find . -type d -iname backup)
do
  # iterate through each column family in the keyspace
  for cf in $(ls ${ks})
  do
# get the directory without the 'backup' path component in it
out_ks=$(echo ${ks} | cut -d'/' -f2,3)

# make backup directory and perform the rsync
mkdir -p /${out_ks}/${cf}
rsync -azP ${ks}/${cf}/ /${out_ks}/${cf}
  done
done

Regards,
Anthony

On 12 May 2017 at 18:00, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> Hi Varun,
> yes you are right - that's the structure that gets created. But if I want
> to backup ALL columnfamilies at once this requires a quite complex rsync as
> Vladimir mentioned.
> I can't just copy over the /data/keyspace directory as that contains all
> the data AND all the snapshots. I really have to go through this
> columnfamily by columnfamily which is annoying.
>
> greetings Daniel
>
> On Thu, 11 May 2017 at 22:48 Varun Gupta  wrote:
>
>>
>> I did not get your question completely, with "snapshot files are mixed
>> with files and backup files".
>>
>> When you call nodetool snapshot, it will create a directory with snapshot
>> name if specified or current timestamp at /data//<
>> columnfamily>/backup/. This directory will have all
>> sstables, metadata files and schema.cql (if using 3.0.9 or higher).
>>
>>
>> On Thu, May 11, 2017 at 2:37 AM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am going through this guide to do backup/restore of cassandra data to
>>> a new cluster:
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/
>>> operations/ops_backup_snapshot_restore_t.html#task_ds_cmf_11r_gk
>>>
>>> When creating a snapshot I get the snapshot files mixed in with the
>>> normal data files and backup files, so it's all over the place and very
>>> hard (especially with lots of tables per keyspace) to transfer ONLY the
>>> snapshot.
>>> (Mostly since there is a snapshot directory per table..)
>>>
>>> Am I missing something or is there some arcane shell command that
>>> filters out only the snapshots?
>>> Because this way it's much easier to just backup the whole data
>>> directory.
>>>
>>> greetings Daniel
>>>
>>
>>

Re: cassandra 3.10

2017-05-11 Thread Anthony Grasso

Hi Dhruva,

There are definitely some performance improvements to Storage Engine in
Cassandra 3.10 which make it worth the upgrade. Note that Cassandra 3.11
has further bug fixes and it may be worth considering a migration to that
version.

Regarding the issue of building a Cassandra 3.10 RPM, it sounds like the
team have built their own custom spec file? Has the team looked at using
the project spec file and associated instructions in Apache Cassandra
GitHub mirror?

https://github.com/apache/cassandra/tree/cassandra-3.10/redhat

Kind regards,
Anthony


On 11 May 2017 at 14:20, Gopal, Dhruva  wrote:

> Hi –
>
>   We’re currently on 3.9 and have been told that Cassandra 3.10 is a more
> stable version to be on. We’ve been using the datastax-ddc rpms in our
> production and dev environments (on 3.9) and it appears there is no 3.10
> rpm version out yet. We tried to build our own rpm (our devops processes
> use rpms, so changing to using tarballs is not easily done) and found that
> the build process fails (to do with the byteman-3.0.3 jar) that we manage
> to patch and get working (with rpmbuild). My concerns/questions are these:
>
> -  Is the 3.10 version actually stable enough given that the
> build failed (we obtained the source from this location:
> http://apache.mirrors.tds.net/cassandra/3.10/apache-
> cassandra-3.10-src.tar.gz) and used the attached patch file for byteman
> during the build process)?
>
> -  Are there any other issues with the binaries that we need to
> be aware of (other patches)?
>
>
>
> I’m concerned that there may be other issues and that we really won’t know
> since we’re not Cassandra experts, so looking for feedback from this group
> on whether we should just stay with 3.9 or if it’s safe to proceed with
> this approach. I can share the spec file and patch files that we’ve setup
> for the build process, if desired.
>
>
>
>
>
> Regards,
>
> *DHRUVA GOPAL*
>
> *sr. MANAGER, ENGINEERING*
>
> *REPORTING, ANALYTICS AND BIG DATA*
>
> *+1 408.325.2011 <+1%20408-325-2011>* *WORK*
>
> *+1 408.219.1094 <+1%20408-219-1094>* *MOBILE*
>
> *UNITED STATES*
>
> *dhruva.go...@aspect.com  *
>
> *aspect.com *
>
> [image: escription: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Smart Table creation for 2D range query

2017-05-08 Thread Anthony Grasso

Hi Lydia,

Yes. This will define the *x*, *y* columns as the components of the
partition key. Note that by doing this both *x* and *y* values will be
required to at a minimum to perform a valid query.

Alternatively, the *x* and *y* values could be combined in into a single
text field as Jon has suggested.

Kind regards,
Anthony

On 7 May 2017 at 17:15, Lydia Ickler  wrote:

> Like this?
>
> CREATE TABLE test (
>   x double,
>   y double,
>   m1 int,
>   ...
>   m5 int,
>   PRIMARY KEY ((x,y), m1, … , m5)
> )
>
>
>
> Am 05.05.2017 um 21:54 schrieb Nitan Kainth :
>
> Make metadata as partition key and x,y as part of partition key i.e.
> Primary key. It should work
>
> Sent from my iPhone
>
> On May 5, 2017, at 2:40 PM, Lydia  wrote:
>
>
> Hi all,
>
>
> I am new to Apache Cassandra and I would like to get some advice on how to
> tackle a table creation / indexing in a sophisticated way.
>
>
> My aim is to store x- and y-coordinates, accompanied by some columns with
> meta information (m1, ... ,m5). There will be around 100,000,000 rows
> overall. Some rows might have the same (x,y) pairs but always distinct meta
> information.
>
>
> In the end I want to do a rather simple range query in the form of e.g. (0
> >= x <= 1) AND (0 >= y <= 1).
>
>
> What would be the best choice of variables to set as primary key,
> partition key. Or should I use a index? And if so on what column(s)?
>
>
> Thanks in advance!
>
> Best regards,
>
> Lydia
>
> -
>
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>

Re: Very slow cluster

2017-04-30 Thread Anthony Grasso

Hi Eduardo,

Please see my comment inline below regarding your third question.

Regards,
Anthony

On 28 April 2017 at 21:26, Eduardo Alonso  wrote:

> Hi to all:
>
> I am having some problems with two client's cassandra:3.0.8 clusters i
> want to share with you. These clusters are for QA and DEV.
>
> The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the
> same physical machine and sharing one ssd. I know this is not the best
> environment but it is only for testing purposes.
>
> The entire cluster runs very slow and sometimes have some failing inserts
> causing saving hints and replaying them and some data inconsistency with 2i
> queries.
>
> I know it is not the best environment (virtual machines sharing physical
> machine and one physical disk) but it is very weird to me that just the
> same test case works like a charm in a 3 docker container inside my
> laptop(i7 16G ssd) but causes a lot of problems in their cluster.
>
> *listen_address* and *rpc_address* are set to external domain name (i. e:
> NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
> strange messages
>
> So, my questions:
>
> *1.- It is posible that one node(with ) send a message to self triggering
> READ_REPAIR?*
>
> TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
> MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:READ_REPAIR 
> going
> over MessagingService
>
> TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
> MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
>  sending
> READ_REPAIR to 3426@/10.63.24.238"
>
> *Does this log line shows one node asking itself for a portion of data
> that it has not? *
>
> *2.-* I have another suspicious log line about slow vms:
>
> -WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 11195193520 > 50
>
> *Does this line says that there is a pause in JVM  of 11 secs*? There is
> no garbage collector log lines. *Is it posible that this 11 secs pause is
> caused by a dns lookup of the domain?*
>
>
> *3.-* I know that listen_address must be the external IP (Inter node
> communications will be faster, no need to dns lookup)
>
> *If i set listen_address to external ip, is it necessary that ip be
> pingable from all the other datacenter nodes? *
> *Does inter-data-center communications use 'rpc_address' or
> 'listen_address'*?
>
>
All nodes in the cluster should be configured so that they can contact each
other. As far as being able to ping each other, enabling ICMP can be useful
for debugging inter communication problems.

Regarding internode communication; the *listen_address* is used for
internode communication in the cluster. Note that if you don't want to
manually specify an IP to *listen_address* for each node in your cluster,
leave it blank and Cassandra will use *InetAddress.getLocalHost()* to pick
an address.


> Thank you in advance
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>

Re: How can I scale my read rate?

2017-03-26 Thread Anthony Grasso

Keep in mind there are side effects to increasing to RF = 4

   - Storage requirements for each node will increase. Depending on the
   number of nodes in the cluster and the size of the data this could be
   significant.
   - Whilst the number of available coordinators increases, the number of
   nodes involved in QUORUM reads/writes will increase from 2 to 3.



On 24 March 2017 at 16:43, Alain Rastoul  wrote:

> On 24/03/2017 01:00, Eric Stevens wrote:
>
>> Assuming an even distribution of data in your cluster, and an even
>> distribution across those keys by your readers, you would not need to
>> increase RF with cluster size to increase read performance.  If you have
>> 3 nodes with RF=3, and do 3 million reads, with good distribution, each
>> node has served 1 million read requests.  If you increase to 6 nodes and
>> keep RF=3, then each node now owns half as much data and serves only
>> 500,000 reads.  Or more meaningfully in the same time it takes to do 3
>> million reads under the 3 node cluster you ought to be able to do 6
>> million reads under the 6 node cluster since each node is just
>> responsible for 1 million total reads.
>>
>> Hi Eric,
>
> I think I got your point.
> In case of really evenly distributed  reads it may (or should?) not make
> any difference,
>
> But when you do not distribute well the reads (and in that case only),
> my understanding about RF was that it could help spreading the load :
> In that case, with RF= 4 instead of 3,  with several clients accessing keys
> same key ranges, a coordinator could pick up one node to handle the request
> in 4 replicas instead of picking up one node in 3 , thus having
> more "workers" to handle a request ?
>
> Am I wrong here ?
>
> Thank you for the clarification
>
>
> --
> best,
> Alain
>
>

Re: Gotchas when creating a lot of tombstones

2014-01-10 Thread Anthony Grasso

Hi Robert,

It sounds like you have done a fair bit investigating and testing already.
Have you considered using a time based data model to avoid doing deletions
in the database?

Regards,
Anthony


On Thu, Jan 9, 2014 at 1:26 PM, sankalp kohli kohlisank...@gmail.comwrote:

 With Level compaction, you will have some data which could not be
 reclaimed with gc grace=0 because it has not compacted yet. For this you
 might want to look at tombstone_threshold


 On Wed, Jan 8, 2014 at 10:31 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Wed, Jan 1, 2014 at 7:53 AM, Robert Wille rwi...@fold3.com wrote:


 Also, for this application, it would be quite reasonable to set gc grace
 seconds to 0 for these tables. Zombie data wouldn’t really be a problem.
 The background process that cleans up orphaned browse structures would
 simply re-delete any deleted data that reappeared.


 If you can set gc grace to 0, that will basically eliminate your
 tombstone concerns entirely, so I would suggest that.


 --
 Tyler Hobbs
 DataStax http://datastax.com/

Re: Recommended amount of free disk space for compaction

2013-11-29 Thread Anthony Grasso

Hi Robert,

We found having about 50% free disk space is a good rule of thumb.
Cassandra will typically use less than that when running compactions,
however it is good to have free space available just in case it compacts
some of the larger SSTables in the keyspace. More information can be found
on the Datastax website [1]

If you have a situation where only one node in the cluster is running low
on disk space and all other nodes are fine for disk space, there are two
things you can do.
1) Run a 'nodetool repair -pr' on each node to ensure that the token ranges
for each node are balanced (this should be run periodically anyway).
2) Run targeted compactions on the problem node using 'nodetool compact
[keyspace] [table]', where [table] is the list of the SSTables tables on
the node that need to be reduced in size.

Note that having a single node that uses all its disk space while the other
nodes are fine implies that there could be underlying issues with the node.

Regards,
Anthony

[1]
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningDiskCapacity_t.html

On Fri, Nov 29, 2013 at 10:48 PM, Sankalp Kohli kohlisank...@gmail.comwrote:

Apart from the compaction, you might want to also look at free space
required for repairs.
This could be problem if you have large rows as repair is not at column
level.

On Nov 28, 2013, at 19:21, Robert Wille rwi...@fold3.com wrote:

I’m trying to estimate our disk space requirements and I’m wondering
about disk space required for compaction.

My application mostly inserts new data and performs updates to existing
data very infrequently, so there will be very few bytes removed by
compaction. It seems that if a major compaction occurs, that performing the
compaction will require as much disk space as is currently consumed by the
table.

So here’s my question. If Cassandra only compacts one table at a time,
then I should be safe if I keep as much free space as there is data in the
largest table. If Cassandra can compact multiple tables simultaneously,
then it seems that I need as much free space as all the tables put
together, which means no more than 50% utilization. So, how much free space
do I need? Any rules of thumb anyone can offer?

Also, what happens if a node gets low on disk space and there isn’t
enough available for compaction? If I add new nodes to reduce the amount of
data on each node, I assume the space won’t be reclaimed until a compaction
event occurs. Is there a way to salvage a node that gets into a state where
it cannot compact its tables?

Thanks

Robert

Re: Data loss when swapping out cluster

2013-11-29 Thread Anthony Grasso

Hi Robert,

In this case would it be possible to do the following to replace a seed
node?

nodetool disablethrift
nodetool disablegossip
nodetool drain

stop Cassandra

deep copy /var/lib/cassandra/* on old seed node to new seed node

start Cassandra on new seed node

Regards,
Anthony


On Wed, Nov 27, 2013 at 6:20 AM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:

 One thing that I didn't mention, and I think may be the culprit after
 doing a lot or mailing list reading, is that when we brought the 4 new
 nodes into the cluster, they had themselves listed in the seeds list.  I
 read yesterday that if a node has itself in the seeds list, then it won't
 bootstrap properly.


 https://issues.apache.org/jira/browse/CASSANDRA-5836

 =Rob

Re: Cluster Management

2013-08-29 Thread Anthony Grasso

Hi Particia,

Thank you for the feedback. It has been helpful.


On Tue, Aug 27, 2013 at 12:02 AM, Patricia Gorla
gorla.patri...@gmail.comwrote:

 Anthony,

 We use a number of tools to manage our Cassandra cluster.

 * Datastax OpsCenter [0] for at a glance information, and trending
 statistics. You can also run operations through here, though I prefer
 to use nodetool for any mutative operation.
 * nodetool for ad hoc status checks, and day-to-day node management.
 * puppet for setup and initialization

  For example, if I want to make some changes to the configuration file
 that resides on each node, is there a tool that will propagate the change
 to each node?

 For this, we use puppet to manage any changes to the configurations
 (which are stored in git). We initially had Cassandra auto-restart
 when the configuration changed, but you might not want the node to
 automatically join a cluster, so we turned this off.


Puppet was the first thing that came to mind for us as well. In addition,
we had the same thought about auto-restarting nodes when the configuration
is changed. If a configuration on all the nodes is changed, we would want
to restart one node at a time and wait for it to rejoin before restarting
the next one. I am assuming in a case like this, you then manually perform
the restart operation for each node?



  Another example is if I want to have a rolling repair (nodetool repair
 -pr) and clean up running on my cluster, is there a tool that will help
 manage/configure that?

 Multiple commands to the cluster are sent via clusterssh [1] (cssh for
 OS X). I can easily choose which nodes to control, and run those in
 sync. For any rolling procedures, we send commands one at a time,
 though we've considered sending some of these tasks to cron.


Thanks again for the tip! This is quite interesting; it may help to solve
our immediate problem for now.

Regards,
Anthony



 Hope this helps.

 Cheers,
 Patricia


 [0] http://planetcassandra.org/Download/DataStaxCommunityEdition
 [1] http://sourceforge.net/projects/clusterssh/

Re: Cluster Management

2013-08-29 Thread Anthony Grasso

Thanks Nate! We will look into this one to see if we can use it.

Regards,
Anthony


On Tue, Aug 27, 2013 at 12:22 AM, Nate McCall n...@thelastpickle.comwrote:


 For example, if I want to make some changes to the configuration file
 that resides on each node, is there a tool that will propagate the change
 to each node?

 You may also want to take a look at Priam from the Netflix folks:
 https://github.com/Netflix/Priam

 Assumes AWS (though some of this is becoming more plug-able).

Cluster Management

2013-08-25 Thread Anthony Grasso

Hi Cassandra Users,

Before I go ahead and create my own solution... are there any tools that
exist to help with the management of a Cassandra cluster?

For example, if I want to make some changes to the configuration file that
resides on each node, is there a tool that will propagate the change to
each node?

Another example is if I want to have a rolling repair (nodetool repair -pr)
and clean up running on my cluster, is there a tool that will help
manage/configure that?

Any feedback would be greatly appreciated.

Thanks,

Anthony

45 matches

Mail list logo