Re: Cassandra DCOS | monitoring connection and user activity

2018-10-31 Thread Anup Shirolkar
Hi,

Looks like you need a monitoring for Cassandra but without using JMX.
It is possible to use metric reporting libraries in Cassandra:
https://wiki.apache.org/cassandra/Metrics#Reporting

I do not have specific experience with using Cassandra on DCOS but,
monitoring with libraries and tools should not be any different.

There are various options available to establish good monitoring (Graphite,
Prometheus, Grafana)

Helpful links:

https://blog.pythian.com/monitoring-apache-cassandra-metrics-graphite-grafana/
https://github.com/instaclustr/cassandra-exporter
https://prometheus.io/docs/instrumenting/exporters/
https://grafana.com/dashboards/5408
http://thelastpickle.com/blog/2017/04/05/cassandra-graphite-measurements-filtered.html

Regards,

Anup Shirolkar




On Wed, 31 Oct 2018 at 18:41, Caesar, Maik  wrote:

> Hello All,
>
> have someone experience with monitoring cassandra in DCOS?
>
> If we increase the load to the Casandra in DCOS, the application get
> timeouts and loose the connection and I do not have any information about
> what happened.
>
> Is there a way to get information about the amount of current connection
> and which queries are executed? Cassandra in DCOS has disabled the JMX
> interface and I think the noodetool do not provide such information.
>
>
>
> Regards
>
> Maik
>
>
>
> DXC Technology Company -- This message is transmitted to you by or on
> behalf of DXC Technology Company or one of its affiliates. It is intended
> exclusively for the addressee. The substance of this message, along with
> any attachments, may contain proprietary, confidential or privileged
> information or information that is otherwise legally exempt from
> disclosure. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient of this message, you are
> not authorized to read, print, retain, copy or disseminate any part of this
> message. If you have received this message in error, please destroy and
> delete all copies and notify the sender by return e-mail. Regardless of
> content, this e-mail shall not operate to bind DXC Technology Company or
> any of its affiliates to any order or other contract unless pursuant to
> explicit written agreement or government initiative expressly permitting
> the use of e-mail for such purpose. --.
>


Re: Merge Cassandra RACs?

2018-10-25 Thread Anup Shirolkar
Hi,

The answer to your question on 'merge Cassandra racks' is, the only safe
way to migrate nodes across physical racks is to perform a DC migration.
This does not include decommissioning and re-adding nodes.
It should be done if the system is stable and do not have other issues.

Talking about unique rack situation, why have you run out of racks ?
Are you currently installing each node on a new rack at the moment ?

There could be simpler alternatives depending upon your rack/data centre
situation.

Regards,

Anup Shirolkar




On Fri, 26 Oct 2018 at 04:52, Ian Spence  wrote:

> Environment: Cassandra: 2.2.9, JRE: 1.8.0_74, CentOS 6/7
>
> We have two DCs. In DC1 we have 3 RACs and in DC2 we have 6.
>
> Because we're in a physical environment (not virtual or cloud based),
> we've run
> short on unique rack space in DC2 and need to fix the layout problems.
>
> Is it possible to somehow merge Cassandra racks without having to
> decommission
> the nodes and re-add them to the cluster one by one? Or is that the
> correct way to accomplish this?
>
> Thanks
>
>


Re: Nodetool info for heap usage

2018-10-22 Thread Anup Shirolkar
Hi,

The nodetool output should be accurate and reliable.

But using nodetool command for monitoring is not a very good idea.
Nodetool has its own resource overhead each time it is invoked.

You should ideally use a standard monitoring tool/method.

Regards,

Anup Shirolkar




On Tue, 23 Oct 2018 at 07:16, Abdul Patel  wrote:

> Hi All,
>
> Is nodetool info information is accurate to monitor memory usage, intially
> with 3.1.0 we had monitoring nodetool infor for heap usage and it never
> reported this information as high,after upgrading to 3.11.2 we started
> getting high usage using nodetool info   later upgraded to 3.11.3 and same
> behaviour.
> Just wanted make sure if monutoring heap memory usage via nodetool info
> correct or its actually memory leak issue in 3.11.2 anf 3.11.3?
>


Re: Upgrade to version 3

2018-10-17 Thread Anup Shirolkar
Hi,

Yes you can upgrade from 2.2 to 3.11.3

The steps for upgrade are there on lots of blogs and sites.

 You can follow:
https://myopsblog.wordpress.com/2017/12/04/upgrade-cassandra-cluster-from-2-x-to-3-x/

You should read the NEWS.txt for information on any release while planning
for upgrade.
https://github.com/apache/cassandra/blob/trunk/NEWS.txt

Please see below mail archive for your case of 2.2 to 3.x :
https://www.mail-archive.com/user@cassandra.apache.org/msg45381.html

Regards,

Anup Shirolkar




On Thu, 18 Oct 2018 at 09:30, Mun Dega  wrote:

> Hello,
>
> If we are upgrading from version 2.2 to 3.x, should we go directly to
> latest version 3.11.3?
>
> Anything we need to look out for?  If anyone can point to an upgrade
> process that would be great!
>
>
>


Re: Upgraded to 3.0.17, stop here or move forward?

2018-10-10 Thread Anup Shirolkar
Hi Ricardo,

Yes no harm in executing upgradesstables multiple times.

Regarding the aggregate functions, you mentioned the data is pre-aggregated
in buckets.
Does that mean the records which are to be used in aggregate function are
part of a single partition.

In my opinion, query performance is dependant on selection of exact
partitions.
So, as long as your aggregation function is operating over data in a single
partition for each query,
the performance should be scalable.
Otherwise, the queries could end up scanning tables.
In that case, if number of records scanned are limited it should work with
some performance degrade.

However, the best way to know this is to load test it.

An alternative to this is to perform the aggregate operations in the
application querying mentioned data e.g. using spark.
A good reason to use this approach is, if Cassandra queries are not optimal
it could impact complete cluster performance.

Hope this helps.

Regards,

Anup Shirolkar




On Wed, 10 Oct 2018 at 20:11, Riccardo Ferrari  wrote:

> Thank you Anup,
>
> Yup, the upgradesstables is a step I generally take before (to make sure
> I'm on the latest version) and after to make sure I'm updating to the
> latest sstable version supported by the version. I know it's redundant and
> not necessary but I read it does not hurt.
>
> I am looking into those aggregations to solve a simple use case where data
> cardinality should be limited in volume. That is find min, max, avg for
> some events over a limited time span in past. We are working on
> pre-processed data pre-aggregated into buckets. The cardinality should be
> below 6K records in the wors case. Any comments? Are there any better
> approach that not involve moving data into another sotrage/index?
>
> Thanks
>
>
> On Wed, Oct 10, 2018 at 1:30 AM Anup Shirolkar <
> anup.shirol...@instaclustr.com> wrote:
>
>> Hi,
>>
>> Yes it makes sense to move to 3.11.3
>> The release has features and bug fixes which should be useful to your
>> cluster.
>>
>> However, if you are planning to use GROUP_BY, UDFs etc.
>> Please be cautious about the performance implications it may cause if not
>> done with suitable queries.
>>
>>  I am not aware of any specific doc to perform the upgrade.
>> But, the steps you are following for upgrades looks fine.
>>
>> I think the `upgradesstables` step is not in correct place in upgrade
>> sequence.
>> I think upgrade sequence should be:
>>
>> - snapshot
>> - drain and stop
>> - backup configs
>> - install new release
>> - review config updates (patch existing config)
>> - start Cassandra
>> - *upgradesstables*
>>
>> *Not to forget*: Perform upgrade on one node at a time.
>>
>> Regards,
>>
>> Anup Shirolkar
>>
>> Instaclustr <https://www.instaclustr.com/>
>>
>>
>> On Wed, 10 Oct 2018 at 00:30, Riccardo Ferrari 
>> wrote:
>>
>>> Hi list,
>>>
>>> We recently upgraded our small cluster to the latest 3.0.17. Everything
>>> was nice and smooth, however I am wondering if ti make sense to keep moving
>>> forward and upgarde to the latest 3.11.3?
>>>
>>> We really need something like the GROUP_BY and UFF/UDA seems limited wrt
>>> our use-case.
>>>
>>> Does it make sense?
>>> Any argument against?
>>> Is there any doc to prepare such upgrade?
>>> My current workflow was very easy:
>>> - snapshot
>>> - upgradesstables
>>> - drain and stop
>>> - backup configs
>>> - install new release
>>> - review config updates (patch existing config)
>>> - start
>>>
>>> Thanks
>>>
>>


Re: Upgraded to 3.0.17, stop here or move forward?

2018-10-09 Thread Anup Shirolkar
Hi,

Yes it makes sense to move to 3.11.3
The release has features and bug fixes which should be useful to your
cluster.

However, if you are planning to use GROUP_BY, UDFs etc.
Please be cautious about the performance implications it may cause if not
done with suitable queries.

 I am not aware of any specific doc to perform the upgrade.
But, the steps you are following for upgrades looks fine.

I think the `upgradesstables` step is not in correct place in upgrade
sequence.
I think upgrade sequence should be:

- snapshot
- drain and stop
- backup configs
- install new release
- review config updates (patch existing config)
- start Cassandra
- *upgradesstables*

*Not to forget*: Perform upgrade on one node at a time.

Regards,

Anup Shirolkar

Instaclustr <https://www.instaclustr.com/>


On Wed, 10 Oct 2018 at 00:30, Riccardo Ferrari  wrote:

> Hi list,
>
> We recently upgraded our small cluster to the latest 3.0.17. Everything
> was nice and smooth, however I am wondering if ti make sense to keep moving
> forward and upgarde to the latest 3.11.3?
>
> We really need something like the GROUP_BY and UFF/UDA seems limited wrt
> our use-case.
>
> Does it make sense?
> Any argument against?
> Is there any doc to prepare such upgrade?
> My current workflow was very easy:
> - snapshot
> - upgradesstables
> - drain and stop
> - backup configs
> - install new release
> - review config updates (patch existing config)
> - start
>
> Thanks
>


Re: Odd CPU utilization spikes on 1 node out of 30 during repair

2018-09-26 Thread Anup Shirolkar
Hi,

Most of the things look ok from your setup.

You can enable Debug logs for repair duration.
This will help identify if you are hitting a bug or other cause of unusual
behaviour.

Just a remote possibility, do you have other things running on nodes
besides Cassandra.
Do they consume additional CPU at times.
You can check per process CPU consumption to keep an eye on non-Cassandra
processes.


Regards,

Anup Shirolkar

On Wed, 26 Sep 2018 at 21:32, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Sep 26, 2018 at 1:07 PM Anup Shirolkar <
> anup.shirol...@instaclustr.com> wrote:
>
>>
>> Looking at information you have provided, the increased CPU utilisation
>> could be because of repair running on the node.
>> Repairs are resource intensive operations.
>>
>> Restarting the node should have halted repair operation getting the CPU
>> back to normal.
>>
>
> The repair was running on all nodes at the same time, still only one node
> had CPU significantly different from the rest of the nodes.
> As I've mentioned: we are running non-incremental parallel repair using
> Cassandra Reaper.
> After the node was restarted, new repair tasks were given to it by Reaper
> and it was doing repair as previously, but this time
> without exposing the odd behavior.
>
> In some cases, repairs trigger additional operations e.g. compactions,
>> anti-compactions
>> These operations could cause extra CPU utilisation.
>> What is the compaction strategy used on majority of keyspaces ?
>>
>
> For the 2 tables involved in this regular repair we are using
> TimeWindowCompactionStrategy with time windows of 30 days.
>
> Talking about CPU utilisation *percentage*, although it has doubled but
>> the increase is 15%.
>> It would be interesting to know the number of CPU cores on these nodes to
>> judge the absolute increase in CPU utilisation.
>>
>
> All nodes are using the same hardware on AWS EC2: r4.xlarge, they have 4
> vCPUs.
>
> You should try to find the root cause behind the behaviour and decide
>> course of action.
>>
>
> Sure, that's why I was asking for ideas how to find the root cause. :-)
>
> Effective use monitoring, logs can help you identify the root cause.
>>
>
> As I've mentioned, we do have monitoring and I've checked the logs, but
> that didn't help to identify the issue so far.
>
> Regards,
> --
> Alex
>
>


Re: Odd CPU utilization spikes on 1 node out of 30 during repair

2018-09-26 Thread Anup Shirolkar
Hi,

Looking at information you have provided, the increased CPU utilisation
could be because of repair running on the node.
Repairs are resource intensive operations.

Restarting the node should have halted repair operation getting the CPU
back to normal.

In case you regularly run repairs but have observed increase in CPU
utilisation first time,
it could be area of concern. Otherwise, repairs utilising extra CPU is
normal.

In some cases, repairs trigger additional operations e.g. compactions,
anti-compactions
These operations could cause extra CPU utilisation.
What is the compaction strategy used on majority of keyspaces ?

Talking about CPU utilisation *percentage*, although it has doubled but the
increase is 15%.
It would be interesting to know the number of CPU cores on these nodes to
judge the absolute increase in CPU utilisation.

You should try to find the root cause behind the behaviour and decide
course of action.
Effective use monitoring, logs can help you identify the root cause.

Regards,
Anup

On Wed, 26 Sep 2018 at 17:34, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> Hello,
>
> On our production cluster of 30 Apache Cassandra 3.0.17 nodes we have
> observed that only one node started to show about 2 times the CPU
> utilization as compared to the rest (see screenshot): up to 30% vs. ~15% on
> average for the other nodes.
>
> This started more or less immediately after repair was started (using
> Cassandra Reaper, parallel, non-incremental) and lasted up until we've
> restarted this node.  After restart the CPU use is in line with the rest of
> nodes.
>
> All other metrics that we are monitoring for these nodes were in line with
> the rest of the cluster.
>
> The logs on the node don't show anything odd, no extra warn/error/info
> messages, not more minor or major GC runs as compared to other nodes during
> the time we were observing this behavior.
>
> What could be the reason for this behavior?  How should we debug it if
> that happens next time instead of just restarting?
>
> Cheers,
> --
> Alex
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org



-- 

Anup Shirolkar

Consultant

+61 420 602 338

<https://www.instaclustr.com/solutions/managed-apache-kafka/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.


Re: User Defined Types?

2018-08-05 Thread Anup Shirolkar
Hi,

Few of the caveats can be found here:
https://issues.apache.org/jira/browse/CASSANDRA-7423

The JIRA is implemented in version *3.6* and you are on 3.0,
So you are affected by UDT behaviour (stored as BLOB) mentioned in the JIRA.

Cheers,
Anup

On 5 August 2018 at 23:29, shalom sagges  wrote:

> Hi All,
>
> Are there any known caveats for User Defined Types in Cassandra (version
> 3.0)?
> One of our teams wants to start using them. I wish to assess it and see if
> it'd be wise (or not) to refrain from using UDTs.
>
>
> Thanks!
>



-- 

Anup Shirolkar

Consultant

+61 420 602 338

<https://www.instaclustr.com/solutions/managed-apache-kafka/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.


Re: cqlsh COPY ... TO ... doesn't work if one node down

2018-07-01 Thread Anup Shirolkar
Hi,

The error shows that, the cqlsh connection with down node is failed.
So, you should debug why it happened.

Although, you have mentioned other node in cqlsh command '10.0.0.154'
my guess is, the down node was present in connection pool, hence it was
attempted for connection.

Ideally the availability of data should not be hampered due
to unavailability of one replica out of 5.
Also the stack trace is about 'cqlsh' connection error.

I think once you get your connection sorted, the COPY should work as usual.

Regards,
Anup


On 30 June 2018 at 15:05, Dmitry Simonov  wrote:

> Hello!
>
> I have cassandra cluster with 5 nodes.
> There is a (relatively small) keyspace X with RF5.
> One node goes down.
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens   Owns (effective)  Host
> ID   Rack
> UN  10.0.0.82   253.64 MB  256  100.0%
> 839bef9d-79af-422c-a21f-33bdcf4493c1  rack1
> UN  10.0.0.154  255.92 MB  256  100.0%
> ce23f3a7-67d2-47c0-9ece-7a5dd67c4105  rack1
> UN  10.0.0.76   461.26 MB  256  100.0%
> c8e18603-0ede-43f0-b713-3ff47ad92323  rack1
> UN  10.0.0.94   575.78 MB  256  100.0%
> 9a324dbc-5ae1-4788-80e4-d86dcaae5a4c  rack1
> DN  10.0.0.47   ?  256  100.0%
> 7b628ca2-4e47-457a-ba42-5191f7e5374b  rack1
>
> I try to export some data using COPY TO, but it fails after long retries.
> Why does it fail?
> How can I make a copy?
> There must be 4 copies of each row on other (alive) replicas.
>
> cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1"
>
> Using 1 child processes
>
> Starting copy of X.Y with columns [key, column1, value].
> 2018-06-29 19:12:23,661 Failed to create connection pool for new host
> 10.0.0.47:
> Traceback (most recent call last):
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
> line 2476, in run_add_or_renew_pool
> new_pool = HostConnection(host, distance, self)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py",
> line 332, in __init__
> self._connection = session.cluster.connection_factory(host.address)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
> line 1205, in connection_factory
> return self.connection_class.factory(address, self.connect_timeout,
> *args, **kwargs)
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
> line 332, in factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py",
> line 344, in __init__
> self._connect_socket()
>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
> line 371, in _connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error:
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last
> error: timed out
> 2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down
> 2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 2.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 4.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 8.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 16.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 32.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 64.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 128.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 256.0 seconds: [Errno None] Tried connecting to
> [('10.0.0.47', 9042)]. Last error: timed out
> 2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47,
> scheduling retry in 512.0 seconds: [Errn

Re: Frequency of rebuild_index

2018-05-02 Thread Anup Shirolkar
contd..

when can the discrepancy in the index arise. Any specific example?

I can not pin point any exact situation. I was referring to situations
which can hamper data replication, consistency adversely.  e.g. single or
multiple Node failures/recovery

anything specific to stratio-lucene-index

If you want to use extensive search like functionality or special kind of
secondary indexing you can explore the stratio-lucene-index option.

Thanks,
Anup

On 3 May 2018 at 10:02, Anup Shirolkar <anup.shirol...@instaclustr.com>
wrote:

> Hi,
>
> when can the discrepancy in the index arise. Any specific example?
>
>
>  any documentation which says the index automatically rebuilds/keeps
>> itself up to date after updations and deletions
>
>
> I was unable to locate anything saying this in Apache C* docs. But here is
> Datastax link if that is good for you
> https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/
> dmlIndexInternals.html?hl=secondary%2Cindex
>
>
>
> On 30 April 2018 at 17:51, Akshit Jain <akshit13...@iiitd.ac.in> wrote:
>
>> Hi,
>> This looks good but when can the discrepancy in the index arise. Any
>> specific example?
>> Is there any documentation which says the index automatically
>> rebuilds/keeps itself up to date after updations and deletions. Also if
>> there anything specific to stratio-lucene-index.
>>
>> Regards
>> Akshit Jain
>> 9891724697
>>
>> On Fri, Apr 27, 2018 at 9:59 AM, Anup Shirolkar <
>> anup.shirol...@instaclustr.com> wrote:
>>
>>> Hi,
>>>
>>> The secondary indices in Cassandra are maintained continuously as data
>>> is written. Also index rebuilding is kicked off automatically when you
>>> create a new index. So, there is no good reason to schedule nodetool
>>> rebuild_index regularly.
>>>
>>> However, if you find any discrepancy in the index and data you should
>>> run it. Ideally, this should not happen but if it is required as a result
>>> of any major activity/failure you can use it.
>>>
>>> Talking about the load it puts on system, it depends upon the size of
>>> index itself. Although it will consume resources, it should not give a
>>> major performance hit to the system.
>>>
>>> Regards,
>>> Anup
>>>
>>> On 27 April 2018 at 13:46, Akshit Jain <akshit13...@iiitd.ac.in> wrote:
>>>
>>>> Hi,
>>>> How frequently one should run nodetool rebuild_index and what's its
>>>> impact on performance in terms of iops,cpu utilisation etc.
>>>>
>>>> Regards
>>>>
>>>>
>>>
>>
>


Re: Frequency of rebuild_index

2018-05-02 Thread Anup Shirolkar
Hi,

when can the discrepancy in the index arise. Any specific example?


 any documentation which says the index automatically rebuilds/keeps itself
> up to date after updations and deletions


I was unable to locate anything saying this in Apache C* docs. But here is
Datastax link if that is good for you
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlIndexInternals.html?hl=secondary%2Cindex



On 30 April 2018 at 17:51, Akshit Jain <akshit13...@iiitd.ac.in> wrote:

> Hi,
> This looks good but when can the discrepancy in the index arise. Any
> specific example?
> Is there any documentation which says the index automatically
> rebuilds/keeps itself up to date after updations and deletions. Also if
> there anything specific to stratio-lucene-index.
>
> Regards
> Akshit Jain
> 9891724697
>
> On Fri, Apr 27, 2018 at 9:59 AM, Anup Shirolkar <
> anup.shirol...@instaclustr.com> wrote:
>
>> Hi,
>>
>> The secondary indices in Cassandra are maintained continuously as data is
>> written. Also index rebuilding is kicked off automatically when you create
>> a new index. So, there is no good reason to schedule nodetool rebuild_index
>> regularly.
>>
>> However, if you find any discrepancy in the index and data you should run
>> it. Ideally, this should not happen but if it is required as a result of
>> any major activity/failure you can use it.
>>
>> Talking about the load it puts on system, it depends upon the size of
>> index itself. Although it will consume resources, it should not give a
>> major performance hit to the system.
>>
>> Regards,
>> Anup
>>
>> On 27 April 2018 at 13:46, Akshit Jain <akshit13...@iiitd.ac.in> wrote:
>>
>>> Hi,
>>> How frequently one should run nodetool rebuild_index and what's its
>>> impact on performance in terms of iops,cpu utilisation etc.
>>>
>>> Regards
>>>
>>>
>>
>


Re: Frequency of rebuild_index

2018-04-26 Thread Anup Shirolkar
Hi,

The secondary indices in Cassandra are maintained continuously as data is
written. Also index rebuilding is kicked off automatically when you create
a new index. So, there is no good reason to schedule nodetool rebuild_index
regularly.

However, if you find any discrepancy in the index and data you should run
it. Ideally, this should not happen but if it is required as a result of
any major activity/failure you can use it.

Talking about the load it puts on system, it depends upon the size of index
itself. Although it will consume resources, it should not give a major
performance hit to the system.

Regards,
Anup

On 27 April 2018 at 13:46, Akshit Jain  wrote:

> Hi,
> How frequently one should run nodetool rebuild_index and what's its impact
> on performance in terms of iops,cpu utilisation etc.
>
> Regards
>
>


Re: Large size KS management

2018-04-19 Thread Anup Shirolkar
Hi Aiman,

Can you please clarify whether the mentioned size of 800GB is considering
Replication Factor(RF) or without it ? If yes, what is the RF ?

Also, what is the method used to measure keyspace data size e.g size of
directory, nodetool command etc.

It would be helpful to know about the cluster node configurations and
topology used.

On basis of information we have, the size 800GB for 15 nodes gives us
53.33GB of data per node which is quite normal for a Cassandra cluster.

A question about growth of data, what is the estimated rate at which the
data will grow ?

If you can clarify these queries, it will be easy to talk about specific
areas of solution.

Thanks,
Anup

On 20 April 2018 at 12:08, Aiman Parvaiz  wrote:

> Hi all
>
> I have been given a 15 nodes C* 2.2.8 cluster to manage which has a large
> size KS (~800GB). Given the size of the KS most of the management tasks
> like repair take a long time to complete and disk space management is
> becoming tricky from the systems perspective.
>
>
> This KS size is going to go up in future and we have a business
> requirement of long data retention here. I wanted to share this with all of
> you and ask what are my options here, what would be the best way to deal
> with a large size KS like this one. To make situation even trickier low IO
> latency is expected from this cluster as well.
>
>
> Thankful for any suggestions/advice in advance.
>
>
>
>


Re: Cassandra 3.7 - Problem with Repairs - all nodes failing

2018-04-19 Thread Anup Shirolkar
Contd.

Upgrading from 3.7 to 3.11.1 will not involving any major changes.
It can be achieved without any downtime and it should not impact on
Cassandra clients.
You can test the upgrade on a test cluster to be sure if you are
considering to upgrade prod.

Thanks,
Anup

On 20 April 2018 at 13:28, Anup Shirolkar <anup.shirol...@instaclustr.com>
wrote:

> Hi Leena,
>
> The repairs are most likely failing because of some bug in Cassandra 3.7.
> I don't have a JIRA reference handy but there are quite some issues in this
> version.
>
> Considering your scenario, it is highly recommended that you should
> upgrade to 3.11.1.
> Although, you have mentioned that upgrading is not an option, I would like
> to tell you that
>
> On 19 April 2018 at 23:19, Leena Ghatpande <lghatpa...@hotmail.com> wrote:
>
>> we have 8 node prod cluster running on cassandra 3.7. Our 2 largest
>> tables have around 100M and 30M rows respectively while all others are
>> relatively smaller.
>>
>> we have been running repairs on alternate days on 2 of our keyspaces.
>> We run repair on each node in the cluster with the -pr option on every
>> table within each keyspace individually. Repairs are run sequentially on
>> each node
>> These were working fine, but with no change on the systems, they have
>> started failing since last month.
>>
>> The repairs have started failing for each table on every node with no
>> specific error.
>>
>> I have tried running scrub on every table and then running repair , but
>> still the repair fails for all tables.
>>
>> Our smallest table with only 100 rows also fails on repair.
>>
>> But if I run the repair with DC option (-dc localdatacenter) for local
>> datacenters, then the repairs are successfully. Is this indication that the
>> repairs are good?
>> we would still want the repairs to work on individually tables as
>> expected.
>>
>> Need help trying to get the repairs to work properly as we have a big
>> migration planned for june .
>>
>> Upgrading cassandra is not an option right now.
>>
>>
>> Here are some of the errors
>> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
>> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree
>> for clients from / IP
>> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461 Validator.java:261
>> - Failed creating a merkle tree for [repair 
>> #223c73c2-4372-11e8-8749-89fc1dde5b7d
>> on secure/clients, [(1849652111528073119,1856811324137977760],
>> (3733211856223440695,3737790228588239952], 
>> (-2500456349659149537,-2498953852677197491],
>> (1735271399836012489,1735412813423041471], 
>> (1871725370007007817,1890457592856328448],
>> (4316163881057906640,4323247409810431754], 
>> (4286141602946572160,4308169130179803373],
>> (5189663040558066167,5193871822490506231], 
>> (7160723554094225326,7161133449395023060],
>> (-4363807597425543488,-4361416517953194804],
>> (7008956720664744733,7022523551326267501], 
>> (-5742986989228874052,-5734436401879059890],
>> (1828335330499002859,1849652111528073119], 
>> (7072368932695202361,7144087505892848370],
>> (-5791935107311742541,-5781988493712029404],
>> (7754917992280096132,7754953485457609099]]], /130.5.123.234 (see log for
>> details)
>> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461
>> CassandraDaemon.java:217 - Exception in thread
>> Thread[ValidationExecutor:213,1,main]
>> java.lang.NullPointerException: null
>> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
>> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree
>> for clients from /IP
>> ERROR [Repair#113:12] 2018-04-18 20:36:51,461 CassandraDaemon.java:217 -
>> Exception in thread Thread[Repair#113:12,5,RMI Runtime]
>> com.google.common.util.concurrent.UncheckedExecutionException:
>> org.apache.cassandra.exceptions.RepairException: [repair
>> #223c73c2-4372-11e8-8749-89fc1dde5b7d on secure/clients,
>> [(1849652111528073119,1856811324137977760],
>> (3733211856223440695,3737790228588239952], 
>> (-2500456349659149537,-2498953852677197491],
>> (1735271399836012489,1735412813423041471], 
>> (1871725370007007817,1890457592856328448],
>> (4316163881057906640,4323247409810431754], 
>> (4286141602946572160,4308169130179803373],
>> (5189663040558066167,5193871822490506231], 
>> (7160723554094225326,7161133449395023060],
>> (-4363807597425543488,-4361416517953194804],
>> (7008956720664744733,7022523551326267501], 
>> (-5742986989228874052,-5734436401879059890],
>> (182

Re: Cassandra 3.7 - Problem with Repairs - all nodes failing

2018-04-19 Thread Anup Shirolkar
Hi Leena,

The repairs are most likely failing because of some bug in Cassandra 3.7. I
don't have a JIRA reference handy but there are quite some issues in this
version.

Considering your scenario, it is highly recommended that you should upgrade
to 3.11.1.
Although, you have mentioned that upgrading is not an option, I would like
to tell you that

On 19 April 2018 at 23:19, Leena Ghatpande  wrote:

> we have 8 node prod cluster running on cassandra 3.7. Our 2 largest tables
> have around 100M and 30M rows respectively while all others are relatively
> smaller.
>
> we have been running repairs on alternate days on 2 of our keyspaces.
> We run repair on each node in the cluster with the -pr option on every
> table within each keyspace individually. Repairs are run sequentially on
> each node
> These were working fine, but with no change on the systems, they have
> started failing since last month.
>
> The repairs have started failing for each table on every node with no
> specific error.
>
> I have tried running scrub on every table and then running repair , but
> still the repair fails for all tables.
>
> Our smallest table with only 100 rows also fails on repair.
>
> But if I run the repair with DC option (-dc localdatacenter) for local
> datacenters, then the repairs are successfully. Is this indication that the
> repairs are good?
> we would still want the repairs to work on individually tables as expected.
>
> Need help trying to get the repairs to work properly as we have a big
> migration planned for june .
>
> Upgrading cassandra is not an option right now.
>
>
> Here are some of the errors
> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree for
> clients from / IP
> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461 Validator.java:261
> - Failed creating a merkle tree for [repair 
> #223c73c2-4372-11e8-8749-89fc1dde5b7d
> on secure/clients, [(1849652111528073119,1856811324137977760],
> (3733211856223440695,3737790228588239952], 
> (-2500456349659149537,-2498953852677197491],
> (1735271399836012489,1735412813423041471], 
> (1871725370007007817,1890457592856328448],
> (4316163881057906640,4323247409810431754], 
> (4286141602946572160,4308169130179803373],
> (5189663040558066167,5193871822490506231], 
> (7160723554094225326,7161133449395023060],
> (-4363807597425543488,-4361416517953194804], 
> (7008956720664744733,7022523551326267501],
> (-5742986989228874052,-5734436401879059890], 
> (1828335330499002859,1849652111528073119],
> (7072368932695202361,7144087505892848370], 
> (-5791935107311742541,-5781988493712029404],
> (7754917992280096132,7754953485457609099]]], /130.5.123.234 (see log for
> details)
> ERROR [ValidationExecutor:213] 2018-04-18 20:36:51,461
> CassandraDaemon.java:217 - Exception in thread
> Thread[ValidationExecutor:213,1,main]
> java.lang.NullPointerException: null
> INFO  [AntiEntropyStage:1] 2018-04-18 20:36:51,461 RepairSession.java:181
> - [repair #223c73c2-4372-11e8-8749-89fc1dde5b7d] Received merkle tree for
> clients from /IP
> ERROR [Repair#113:12] 2018-04-18 20:36:51,461 CassandraDaemon.java:217 -
> Exception in thread Thread[Repair#113:12,5,RMI Runtime]
> com.google.common.util.concurrent.UncheckedExecutionException:
> org.apache.cassandra.exceptions.RepairException: [repair
> #223c73c2-4372-11e8-8749-89fc1dde5b7d on secure/clients,
> [(1849652111528073119,1856811324137977760], 
> (3733211856223440695,3737790228588239952],
> (-2500456349659149537,-2498953852677197491], 
> (1735271399836012489,1735412813423041471],
> (1871725370007007817,1890457592856328448], 
> (4316163881057906640,4323247409810431754],
> (4286141602946572160,4308169130179803373], 
> (5189663040558066167,5193871822490506231],
> (7160723554094225326,7161133449395023060], 
> (-4363807597425543488,-4361416517953194804],
> (7008956720664744733,7022523551326267501], 
> (-5742986989228874052,-5734436401879059890],
> (1828335330499002859,1849652111528073119], 
> (7072368932695202361,7144087505892848370],
> (-5791935107311742541,-5781988493712029404], 
> (7754917992280096132,7754953485457609099]]]
> Validation failed in /130.5.127.60
> at com.google.common.util.concurrent.Futures.
> wrapAndThrowUnchecked(Futures.java:1525) ~[guava-18.0.jar:na]
> at com.google.common.util.concurrent.Futures.
> getUnchecked(Futures.java:1511) ~[guava-18.0.jar:na]
> at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
> ~[apache-cassandra-3.7.jar:3.7]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
> Caused by: org.apache.cassandra.exceptions.RepairException: [repair
> #223c73c2-4372-11e8-8749-89fc1dde5b7d on clients, 
>