Re: Re: How to gracefully decommission a highly loaded node?

2018-12-17 Thread Riccardo Ferrari
I am having "the same" issue. One of my nodes seems to have some hardware struggle, out of 6 nodes (same instance size) this one is likely to be makred down, it consntantly compacting, high system load, it's just a big pain. My idea was to add nodes and decommission all the one running on old

Re: AWS r5.xlarge vs i3.xlarge

2018-12-10 Thread Riccardo Ferrari
f spinning disks. Both i3 and r5d are EBS optimized Regards, On Mon, Dec 10, 2018 at 2:38 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Mon, Dec 10, 2018 at 12:20 PM Riccardo Ferrari > wrote: > >> I am wondering what instance type is best for a sm

AWS r5.xlarge vs i3.xlarge

2018-12-10 Thread Riccardo Ferrari
Hi list! I am wondering what instance type is best for a small cassandra cluster on AWS. Actually I'd like to compare, or have your opinion about the following instances: - r5*d*.xlarge (4vCPU, *19*ecu, 32GB ram and 1 NVMe instance store 150GB) - Need to attach a 600/900GB ESB -

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
11:14 AM Riccardo Ferrari > wrote: > >> >> I had few instances in the past that were showing that unresponsivveness >> behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck >> on a single thread processing (full throttle) for seconds. According t

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
leksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Wed, 5 Dec 2018, 19:34 Riccardo Ferrari >> Hi Alex, >> >> I saw that behaviout in the past. >> > > Riccardo, > > Thank you for the reply! > > Do you refer to kswapd issue only or have y

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Riccardo Ferrari
Hi Alex, I saw that behaviout in the past. I can tell you the kswapd0 usage is connected to the `disk_access_mode` property. On 64bit systems defaults to mmap. That also explains why your virtual memory is so high (it somehow matches the node load, right?). I can not find and good reference

Re: Correct repair job

2018-11-21 Thread Riccardo Ferrari
Hi, Given the number of nodes, I would consider deploying a tool like cassandra reaper . Best, On Wed, Nov 21, 2018 at 6:28 AM Pranab Bordoloi wrote: > Hi, > Yes, it needs to run on every node. This may help - > https://www.datastax.com/dev/blog/repair-in-cassandra

Re: Upgraded to 3.0.17, stop here or move forward?

2018-10-10 Thread Riccardo Ferrari
> - review config updates (patch existing config) > - start Cassandra > - *upgradesstables* > > *Not to forget*: Perform upgrade on one node at a time. > > Regards, > > Anup Shirolkar > > Instaclustr <https://www.instaclustr.com/> > > > On Wed, 10 Oct 20

Upgraded to 3.0.17, stop here or move forward?

2018-10-09 Thread Riccardo Ferrari
Hi list, We recently upgraded our small cluster to the latest 3.0.17. Everything was nice and smooth, however I am wondering if ti make sense to keep moving forward and upgarde to the latest 3.11.3? We really need something like the GROUP_BY and UFF/UDA seems limited wrt our use-case. Does it

Re: About UDF/UDA

2018-09-26 Thread Riccardo Ferrari
NC finalFunction] > INITCOND initCond; > > > The final return type will be the return type of the FINALFUNC and not > necessarily the stateType > > More details by reading my blog post on it: > http://www.doanduyhai.com/blog/?p=1876 > > On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferra

About UDF/UDA

2018-09-26 Thread Riccardo Ferrari
Hi users! Given my Cassandra version 3.0.x I don't have the famous GROUP BY operator available. So looking around I turned to UDAs. I'm aware all/most of the magic happens on the coordinator and the plan is to keep the data volume low to avoid too much pressure. Q1: How much is low volume. It's

Re: jmxterm "#NullPointerException: No such PID "

2018-09-18 Thread Riccardo Ferrari
Hi Philip, I've used jmxterm myself without any problems particular problems. On my systems too, I don't get the cassandra daemon listed when issuing the `jvms` command but I never spent much time investigating it. Assuming you have not changed anything relevant in the cassandra-env.sh you can

Re: Read timeouts when performing rolling restart

2018-09-18 Thread Riccardo Ferrari
you to the root > cause of the issue. > > C*heers, > ------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le jeu. 13 sept. 2018 à 09:50, Riccardo Ferrari

Re: Read timeouts when performing rolling restart

2018-09-13 Thread Riccardo Ferrari
imeouts (probably due to GC and hints). > > Hope this helps! > > On Thu, Sep 13, 2018 at 2:20 AM Riccardo Ferrari > wrote: > >> A little update on the progress. >> >> First: >> Thank you Thomas. I checked the code in the patch and briefly skimmed >> through

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
gt;> https://issues.apache.org/jira/browse/CASSANDRA-8236 >> >> >> >> which looks similar, but above was marked as fixed in 2.2. >> >> >> >> Thomas >> >> >> >> *From:* Riccardo Ferrari >> *Sent:* Mittwoch, 12. September 20

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi Alain, Thank you for chiming in! I was thinking to perform the 'start_native_transport=false' test as well and indeed the issue is not showing up. Starting the/a node with native transport disabled and letting it cool down lead to no timeout exceptions no dropped messages, simply a crystal

Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi list, We are seeing the following behaviour when performing a rolling restart: On the node I need to restart: * I run the 'nodetool drain' * Then 'service cassandra restart' so far so good. The load incerase on the other 5 nodes is negligible. The node is generally out of service just for

Re: How to downloading Cassandra 3.11.0 and 3.11.2 binaries for ubuntu

2018-08-04 Thread Riccardo Ferrari
You should be able to do: apt-get install cassandra=3.11.2, the same applies to cassandra-tools Have a look here: https://askubuntu.com/a/92021 Also, I find useful apt-cache madison to list all the avilable versions HTH On Sat, Aug 4, 2018 at 3:56 PM, R1 J1 wrote: > What are the steps to

Re: how to make cassandra listen not on 127.0.0.1 on 9042

2018-07-20 Thread Riccardo Ferrari
Hi, Have a look at the rcp_address description http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html?highlight=rpc_address#rpc-address. what does your hostname resolves to? Best, On Fri, Jul 20, 2018 at 3:09 PM, Vitaliy Semochkin wrote: > Hi > > I'm building a

Re: concurrent_compactors via JMX

2018-07-19 Thread Riccardo Ferrari
t; This (or similar settings) worked for distinct cases having heavy read >> patterns. In the mailing list I explained recently to someone else my >> understanding of JVM and GC, also there is a blog post my colleague Jon >> wrote here: http://thelastpickle.com/blog/2018/04/1

Re: concurrent_compactors via JMX

2018-07-18 Thread Riccardo Ferrari
Chris, Thank you for mbean reference. On Wed, Jul 18, 2018 at 6:26 PM, Riccardo Ferrari wrote: > Alain, thank you for email. I really really appreciate it! > > I am actually trying to remove the disk io from the suspect list, thus I'm > want to reduce the number of concurrent comp

Re: concurrent_compactors via JMX

2018-07-18 Thread Riccardo Ferrari
, and I must say for most of the cluster I work on, default the > tuning is not that good and can keep server busy 10-15% of the time with > stop the world GC. > You might find this post of my colleague Jon about GC tuning for Apache > Cassandra interesting: http://thelastpickle

concurrent_compactors via JMX

2018-07-17 Thread Riccardo Ferrari
Hi list, Cassandra 3.0.6 I'd like to test the change of concurrent compactors to see if it helps when the system is under stress. Can someone point me to the right mbean? I can not really find good docs about mbeans (or tools ...) Any suggestion much appreciated, best

Added a new node, now what repair is best?

2018-07-01 Thread Riccardo Ferrari
Hi list, After long time of operation we come to the need of growing our cluster. This cluster was born on 2.X and almos 2 years ago migrated to 3.0.6 ( I know we are bit prudent ) The cluster was a 3 m1.xlarge (we are on AWS) and table RF was 3 Thanks to your valuable hints we added a new

Re: Cassandra 3.0.X migarte to VPC

2018-06-08 Thread Riccardo Ferrari
Thank you guys! Much appreciated. Leaving the snitch aside for a moment, we can fix this either before or after the migration. I understand I should prefer adding a new DC rather than extending/shrinking my current (and only) DC. Correct? Thanks, On Fri, Jun 8, 2018 at 2:22 AM, kurt greaves

Cassandra 3.0.X migarte to VPC

2018-06-07 Thread Riccardo Ferrari
Dear list, We have a small cluster on AWS EC2-Classic and we are planning to move it to a VPC. I know this has been discussed few times already including here:

Re: 3.0.6 - CorruptSSTableException

2017-11-07 Thread Riccardo Ferrari
, Nov 7, 2017 at 6:54 PM, <adama.diab...@orange.com> wrote: > Hi Riccardo, > > > > The following may help me, as the case described there is similar to yours > ! > > https://engineering.gosquared.com/dealing-corrupt-sstable-cassandra > > > > Regards. >

3.0.6 - CorruptSSTableException

2017-11-06 Thread Riccardo Ferrari
Hi list, It happened that one of our EC2 instance of our cluster got rebooted. Unfortunately when back Cassandra 3.0.6 failed to restart complaining about: ERROR [NonPeriodicTasks:1] 2017-11-04 03:44:20,019 LogTransaction.java:204 - Unable to delete //system/local/ma-292-big-Data.db as it does

Re: Upgrade from 3.0.6, where's the documentation?

2017-06-15 Thread Riccardo Ferrari
Jeff, Thank you so much for your answer. If you say there are 2 very important fixes in next release I believe we can wait couple of weeks. Thanks! On Fri, Jun 16, 2017 at 12:35 AM, Jeff Jirsa <jji...@apache.org> wrote: > > > On 2017-06-14 07:05 (-0700), Riccardo Ferrari <

Upgrade from 3.0.6, where's the documentation?

2017-06-14 Thread Riccardo Ferrari
Hi list, It's been a while since I upgraded my C* to 3.0.6, nevertheless I would like to give TWCS a try (avaialble since 3.0.7). What happened to the upgrade documentation ? I was used to read some step-by-step procedure from datastax but looks like they are not supporting it anymore, on the

Re: Failure when setting up cassandra in cluster

2016-08-22 Thread Riccardo Ferrari
Hi that's very likely because of: > > empty the listen_address entry and # Leaving it blank leaves it up to InetAddress.getLocalHost(). This # will always do the Right Thing _if_ the node is properly configured # (hostname, name resolution, etc), and the Right Thing is to use the # address

JVM Crash on 3.0.6

2016-08-11 Thread Riccardo Ferrari
Hi C* users, In recent time I had couple of my nodes crashing (on different dates). I don't have core dumps however my JVM crash logs goes like this: === # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at

Re: Verify cassandra backup and restore in C * 2.1

2016-08-09 Thread Riccardo Ferrari
Hi Indranil, I think it really depends on what makes a backup "correct" for you. Do you have some test you can run on that data? When I want to test my data I usually restore it in a new cluster (ie. on AWS) and use Spark to perform some cross-tests. This is a bit cumbersome nevertheless does the

Re: Gossip Threshold

2016-07-25 Thread Riccardo Ferrari
Hi Jean, I think this is a good resource: https://www.youtube.com/watch?v=FuP1Fvrv6ZQ Best, On Mon, Jul 25, 2016 at 2:45 PM, jean paul wrote: > As i find in cassandra Documentation, the gossip process runs every second. > Please, why you have chosen 'running it *every

Re: (C)* stable version after 3.5

2016-07-18 Thread Riccardo Ferrari
Check the "Compatibility" section of the Cassandra Java driver. Since the driver is backward compatible when we did upgraded we first upgrade our applications to the latest java driver version then we upgraded our C* cluster. best, On Mon, Jul 18, 2016 at 9:06 AM, Varun Barala

Re: Is my cluster normal?

2016-07-12 Thread Riccardo Ferrari
4 0 >>>> 0 >>>> >>>> MigrationStage0 0 35 0 >>>> 0 >>>> >>>> MemtablePostFlush 0 0 1973 0 >>>

Re: NoHostAvailableException coming up on our server

2016-07-12 Thread Riccardo Ferrari
What driver version are you using? You can look at the LoggingRetryPolicy to have more meaningful messages in your logs. best, On Tue, Jul 12, 2016 at 9:02 PM, Abhinav Solan wrote: > Thanks, Johnny > Actually, they were running .. it went through a series of read and

Re: (C)* stable version after 3.5

2016-07-12 Thread Riccardo Ferrari
You may want to read more about Cassandra release process, find: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ On Tue, Jul 12, 2016 at 4:01 PM, Alain RODRIGUEZ wrote: > Hi, > > The only "fix" release after 3.5 is 3.7. Yet hard to say if it is more >

Re: DTCS SSTable count issue

2016-07-11 Thread Riccardo Ferrari
e.com > > 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <jeff.ji...@crowdstrike.com>: > >> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow >> over time, but ideally data will expire as it nears your 90 day TTL and >> those tables should start dropping away as t

Re: Problems with cassandra on AWS

2016-07-11 Thread Riccardo Ferrari
I would check your security group settings, you need to allow communication on cassandra ports (ie 9042,...) On Mon, Jul 11, 2016 at 8:17 AM, daemeon reiydelle wrote: > xWell, I seem to recall that the private IP's are valid for communications > WITHIN one VPC. I assume you

Re: Is my cluster normal?

2016-07-07 Thread Riccardo Ferrari
Hi Yuan, You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside from any Cassandra specific discussion a system load of 10 on a 4 threads machine is way too much in my opinion. If that is the running average system load I would look deeper into system details. Is that IO wait? Is

DTCS SSTable count issue

2016-07-07 Thread Riccardo Ferrari
Hi everyone, This is my first question, apologize may I do something wrong. I have a small Cassandra cluster build upon 3 nodes. Originally born as 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 recently 3.0.6. Ubuntu is the OS. There are few tables that have