nodes are always out of sync

2017-03-30 Thread Roland Otta
hi, we see the following behaviour in our environment: cluster consists of 6 nodes (cassandra version 3.0.7). keyspace has a replication factor 3. clients are writing data to the keyspace with consistency one. we are doing parallel, incremental repairs with cassandra reaper. even if a repair

repair performance

2017-03-17 Thread Roland Otta
hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our

Re: repair performance

2017-03-17 Thread Roland Otta
ings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment a

Re: repair performance

2017-03-17 Thread Roland Otta
... maybe i should just try increasing the job threads with --job-threads shame on me On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out

Re: repair performance

2017-03-17 Thread Roland Otta
did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta <roland.o...@willhaben

Re: repair performance

2017-03-20 Thread Roland Otta
ummit-2016 From: Roland Otta <roland.o...@willhaben.at> Date: Friday, March 17, 2017 at 5:47 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: repair performance did not recognize that so far. thank you for the hint. i will definitely give it a try O

spikes in blocked native transport requests

2017-03-20 Thread Roland Otta
we have a datacenter which is currently used exlusively for spark batch jobs. in case batch jobs are running against that environment we can see very high peaks in blocked native transport requests (up to 10k / minute). i am concerned because i guess that will slow other queries (in case other

Re: Inconsistent data after adding a new DC and rebuilding

2017-04-10 Thread Roland Otta
Hi, we have seen similar issues here. have you verified that your rebuilds have been finished successfully? we have seen rebuilds that stopped streaming and working but have not finished. what does nodetool netstats output for your newly built up nodes? br, roland On Mon, 2017-04-10 at 17:15

Re: WriteTimeoutException with LWT after few milliseconds

2017-04-12 Thread Roland Otta
sorry .. ignore my comment ... i missed your comment that the record is in the table ... On Wed, 2017-04-12 at 16:48 +0200, Roland Otta wrote: Hi Benjamin, its unlikely that i can assist you .. but nevertheless ... i give it a try ;-) whats your consistency level for the insert? what if one

Re: WriteTimeoutException with LWT after few milliseconds

2017-04-12 Thread Roland Otta
Hi Benjamin, its unlikely that i can assist you .. but nevertheless ... i give it a try ;-) whats your consistency level for the insert? what if one ore more nodes are marked down and proper consistency cant be achieved? of course the error message does not indicate that problem (as it says its

cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
hi, we are trying to setup a new datacenter and are initalizing the data with nodetool rebuild. after some hours it seems that the node stopped streaming (at least there is no more streaming traffic on the network interface). nodetool netstats shows that the streaming is still in progress

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
checked the pending compactions. there are no pending compactions at the moment. bg - roland otta On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote: What version are you running? Do you see any errors in the system.log (SocketTimeout, for instance)? And what values do you have for the following

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
at 7:16 AM, Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote: Hi! we are on 3.7. we have some debug messages ... but i guess they are not related to that issue DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - Ignoring i

force processing of pending hinted handoffs

2017-04-11 Thread Roland Otta
hi, sometimes we have the problem that we have hinted handoffs (for example because auf network problems between 2 DCs) that do not get processed even if the connection problem between the dcs recovers. Some of the files stay in the hints directory until we restart the node that contains the

Re: Inconsistent data after adding a new DC and rebuilding

2017-04-11 Thread Roland Otta
22565129 n/a On Mon, Apr 10, 2017 at 5:28 PM, Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote: Hi, we have seen similar issues here. have you verified that your rebuilds have been finished successfully? we have seen rebuilds that stopp

hanging validation compaction

2017-04-13 Thread Roland Otta
hi, we have the following issue on our 3.10 development cluster. we are doing regular repairs with thelastpickle's fork of creaper. sometimes the repair (it is a full repair in that case) hangs because of a stuck validation compaction nodetool compactionstats gives me

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
and is not related to my config changes On Thu, 2017-04-13 at 11:58 +0200, benjamin roth wrote: If you restart the server the same validation completes successfully? If not, have you tries scrubbing the affected sstables? 2017-04-13 11:43 GMT+02:00 Roland Otta <roland.o...@willhaben.at<mailto:ro

Re: force processing of pending hinted handoffs

2017-04-13 Thread Roland Otta
/HintedHandOffManagerMBean.html but everytime i try invoking that operation i get an UnsupportedOperationException (tried it with hostname, ip and host-id as parameters - everytime the same exception) On Tue, 2017-04-11 at 07:40 +, Roland Otta wrote: > hi, > > sometimes we have th

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
connect to the node with JConsole and see where the compaction thread is stuck 2017-04-13 8:34 GMT+02:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: hi, we have the following issue on our 3.10 development cluster. we are doing regular repairs with thelastp

Re: force processing of pending hinted handoffs

2017-04-13 Thread Roland Otta
rote: There is a nodetool command to resume hints. Maybe that helps? Am 13.04.2017 09:42 schrieb "Roland Otta" <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: oh ... the operation is deprecated according to the docs ... On Thu, 2017-04-13 at 07:40 +, Roland O

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
reading from an SSTable. Unfortunately I am no caffeine expert. It looks like the read is cached and after the read caffeine tries to drain the cache and this is stuck. I don't see the reason from that stack trace. Someone had to dig deeper into caffeine to find the root cause. 2017-04-13 9:27 GMT+02:00

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
you which sstable is being scrubbed. 2017-04-13 15:07 GMT+02:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: i made a copy and also have the permission to upload sstables for that particular column_family is it possible to track down which sstable of that cf

Re: force processing of pending hinted handoffs

2017-04-13 Thread Roland Otta
oh ... the operation is deprecated according to the docs ... On Thu, 2017-04-13 at 07:40 +, Roland Otta wrote: > i figured out that there is an mbean > org.apache.cassandra.db.type=HintedHandoffManager with the operation > scheduleHintDelivery > > i guess thats wh

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
reproduction case for the issue - you should copy the sstable away for further testing. Are you allowed to upload the broken sstable to JIRA? 2017-04-13 13:15 GMT+02:00 Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>>: sorry .. i have to correct myself .. the problem st

Re: hanging validation compaction

2017-04-13 Thread Roland Otta
/899929247.run(Unknown Source) java.lang.Thread.run(Thread.java:745) br, roland On Thu, 2017-04-13 at 10:04 +, Roland Otta wrote: i did 2 restarts before which did not help after that i have set for testing purposes file_cache_size_in_mb: 0 and buffer_pool_use_heap_if_exhausted: false

Re: nodes are always out of sync

2017-04-01 Thread Roland Otta
n your read/writes use each or local quorum for both. Chris On Thu, Mar 30, 2017 at 1:22 AM, Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote: hi, we see the following behaviour in our environment: cluster consists of 6 nodes (cassandra version 3.0.7). k