Re: Bloom filter false positives high

2019-05-16 Thread Martin Mačura
I've decreased bloom_filter_fp_chance from 0.01 to 0.001.  The
sstableupgrade took 3 days to complete. And this is a result:
node1
   Bloom filter false positives: 380965
   Bloom filter false ratio: 0.46560
   Bloom filter space used: 27.1 MiB
   Bloom filter off heap memory used: 27.09 MiB
node2
   Bloom filter false positives: 866636
   Bloom filter false ratio: 0.40865
   Bloom filter space used: 27.78 MiB
   Bloom filter off heap memory used: 27.77 MiB
node3
   Bloom filter false positives: 433296
   Bloom filter false ratio: 0.20359
   Bloom filter space used: 26.15 MiB
   Bloom filter off heap memory used: 26.15 MiB
node4
   Bloom filter false positives: 550721
   Bloom filter false ratio: 0.30233
   Bloom filter space used: 24.7 MiB
   Bloom filter off heap memory used: 24.7 MiB




Martin




On Wed, Apr 17, 2019 at 1:45 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Lastly I wonder if that number is very same from every node you
> connect your nodetool to. Do all nodes see very similar false
> positives ratio / number?
>
> On Wed, 17 Apr 2019 at 21:41, Stefan Miklosovic
>  wrote:
> >
> > One thing comes to my mind but my reasoning is questionable as I am
> > not an expert in this.
> >
> > If you think about this, the whole concept of Bloom filter is to check
> > if some record is in particular SSTable. False positive mean that,
> > obviously, filter thought it was there but in fact it is not. So
> > Cassandra did a look unnecessarily. Why does it think that it is there
> > in such number of cases? You either make a lot of same requests on
> > same partition key over time hence querying same data over and over
> > again (but would not that data be cached?) or there was a lot of data
> > written with same partition key so it thinks it is there but
> > clustering column is different. As ts is of type timeuuid, isnt it
> > true that you are doing a lot of queries with some date? It might be
> > true that hash is done only on partition keys and not on clustering
> > columns so filter gives you "yes" and it goes there, checks it
> > clustering column is equal what you queried and its not there. But as
> > I say I might be wrong ...
> >
> > More to it, your read_repair_chance is 0.0 so it will never do a
> > repair after successful read (e.g. you have rf 3 and cl quorum so one
> > node is somehow behind) so if you dont run repairs maybe it is just
> > somehow unsychronized but that is really just my guess.
> >
> > On Wed, 17 Apr 2019 at 21:39, Martin Mačura  wrote:
> > >
> > > We cannot run any repairs on these tables.  Whenever we tried it
> (incremental or full or partitioner range), it caused a node to run out of
> disk space during anticompaction.  We'll try again once Cassandra 4.0 is
> released.
> > >
> > > On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
> > >>
> > >> if you invoke nodetool it gets false positives number from this metric
> > >>
> > >>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578
> > >>
> > >> You get high false positives so this accumulates them
> > >>
> > >>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572
> > >>
> > >> If you follow that, that number is computed here
> > >>
> > >>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55
> > >>
> > >> In order to have that number so high, the difference has to be so big
> > >> so lastFalsePositiveCount is imho significantly lower
> > >>
> > >> False positives are ever increased only in BigTableReader where it get
> > >> complicated very quickly and I am not sure why it is called to be
> > >> honest.
> > >>
> > >> Is all fine with db as such? Do you run repairs? Does that number
> > >> increses or decreases over time? Has repair or compaction some effect
> > >> on it?
> > >>
> > >> On Wed, 17 Apr 2019 at 20:48, Martin Mačura 
> wrote:
> > >> >
> > >> > Both tables use the default bloom_filter_fp_chance of 0.01 ...
> > >> >
> > >>

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
We cannot run any repairs on these tables.  Whenever we tried it
(incremental or full or partitioner range), it caused a node to run out of
disk space during anticompaction.  We'll try again once Cassandra 4.0 is
released.

On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> if you invoke nodetool it gets false positives number from this metric
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578
>
> You get high false positives so this accumulates them
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572
>
> If you follow that, that number is computed here
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55
>
> In order to have that number so high, the difference has to be so big
> so lastFalsePositiveCount is imho significantly lower
>
> False positives are ever increased only in BigTableReader where it get
> complicated very quickly and I am not sure why it is called to be
> honest.
>
> Is all fine with db as such? Do you run repairs? Does that number
> increses or decreases over time? Has repair or compaction some effect
> on it?
>
> On Wed, 17 Apr 2019 at 20:48, Martin Mačura  wrote:
> >
> > Both tables use the default bloom_filter_fp_chance of 0.01 ...
> >
> > CREATE TABLE ... (
> >a int,
> >b int,
> >bucket timestamp,
> >ts timeuuid,
> >c int,
> > ...
> >PRIMARY KEY ((a, b, bucket), ts, c)
> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
> >AND bloom_filter_fp_chance = 0.01
> >AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> > 'false'}
> >AND dclocal_read_repair_chance = 0.0
> >AND default_time_to_live = 63072000
> >AND gc_grace_seconds = 10800
> > ...
> >AND read_repair_chance = 0.0
> >AND speculative_retry = 'NONE';
> >
> >
> > CREATE TABLE ... (
> >c int,
> >b int,
> >bucket timestamp,
> >ts timeuuid,
> > ...
> >PRIMARY KEY ((c, b, bucket), ts)
> > ) WITH CLUSTERING ORDER BY (ts DESC)
> >AND bloom_filter_fp_chance = 0.01
> >AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> > 'false'}
> >AND dclocal_read_repair_chance = 0.0
> >AND default_time_to_live = 63072000
> >AND gc_grace_seconds = 10800
> > ...
> >AND read_repair_chance = 0.0
> >AND speculative_retry = 'NONE';
> >
> > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
> >>
> >> What is your bloom_filter_fp_chance for either table? I guess it is
> >> bigger for the first one, bigger that number is between 0 and 1, less
> >> memory it will use (17 MiB against 54.9 Mib) which means more false
> >> positives you will get.
> >>
> >> On Wed, 17 Apr 2019 at 19:59, Martin Mačura  wrote:
> >> >
> >> > Hi,
> >> > I have a table with poor bloom filter false ratio:
> >> >SSTable count: 1223
> >> >Space used (live): 726.58 GiB
> >> >Number of partitions (estimate): 8592749
> >> >Bloom filter false positives: 35796352
> >> >Bloom filter false ratio: 0.68472
> >> >Bloom filter space used: 17.82 MiB
> >> >Compacted partition maximum bytes: 386857368
> >> >
> >> > It's a time series, TWCS compaction, window size 1 day, data
> partitioned in daily buckets, TTL 2 years.
> >> >
> >> > I have another table with a similar schema, but it is not affected
> for some reason:
> >> >SSTable count: 1114
> >> >Space used (live): 329.87 GiB
> >> >Number of partitions (estimate): 25460768
> >> >Bloom filter false positives: 156942
> >> >Bloom filter false ratio: 0.00010
> >> >Bloom filter space used: 54.9 MiB
> >> >Compacted partition maximum bytes: 20924300
> >> >
> >> > Thanks for any advice,
> >> >
> >> > Martin
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Both tables use the default bloom_filter_fp_chance of 0.01 ...

CREATE TABLE ... (
   a int,
   b int,
   bucket timestamp,
   ts timeuuid,
   c int,
...
   PRIMARY KEY ((a, b, bucket), ts, c)
) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
   AND bloom_filter_fp_chance = 0.01
   AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
'false'}
   AND dclocal_read_repair_chance = 0.0
   AND default_time_to_live = 63072000
   AND gc_grace_seconds = 10800
...
   AND read_repair_chance = 0.0
   AND speculative_retry = 'NONE';


CREATE TABLE ... (
   c int,
   b int,
   bucket timestamp,
   ts timeuuid,
...
   PRIMARY KEY ((c, b, bucket), ts)
) WITH CLUSTERING ORDER BY (ts DESC)
   AND bloom_filter_fp_chance = 0.01
   AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
'false'}
   AND dclocal_read_repair_chance = 0.0
   AND default_time_to_live = 63072000
   AND gc_grace_seconds = 10800
...
   AND read_repair_chance = 0.0
   AND speculative_retry = 'NONE';

On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> What is your bloom_filter_fp_chance for either table? I guess it is
> bigger for the first one, bigger that number is between 0 and 1, less
> memory it will use (17 MiB against 54.9 Mib) which means more false
> positives you will get.
>
> On Wed, 17 Apr 2019 at 19:59, Martin Mačura  wrote:
> >
> > Hi,
> > I have a table with poor bloom filter false ratio:
> >SSTable count: 1223
> >Space used (live): 726.58 GiB
> >Number of partitions (estimate): 8592749
> >Bloom filter false positives: 35796352
> >Bloom filter false ratio: 0.68472
> >Bloom filter space used: 17.82 MiB
> >Compacted partition maximum bytes: 386857368
> >
> > It's a time series, TWCS compaction, window size 1 day, data partitioned
> in daily buckets, TTL 2 years.
> >
> > I have another table with a similar schema, but it is not affected for
> some reason:
> >SSTable count: 1114
> >Space used (live): 329.87 GiB
> >Number of partitions (estimate): 25460768
> >Bloom filter false positives: 156942
> >Bloom filter false ratio: 0.00010
> >Bloom filter space used: 54.9 MiB
> >Compacted partition maximum bytes: 20924300
> >
> > Thanks for any advice,
> >
> > Martin
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Hi,
I have a table with poor bloom filter false ratio:
   SSTable count: 1223
   Space used (live): 726.58 GiB
   Number of partitions (estimate): 8592749
   Bloom filter false positives: 35796352
   Bloom filter false ratio: 0.68472
   Bloom filter space used: 17.82 MiB
   Compacted partition maximum bytes: 386857368

It's a time series, TWCS compaction, window size 1 day, data partitioned in
daily buckets, TTL 2 years.

I have another table with a similar schema, but it is not affected for some
reason:
   SSTable count: 1114
   Space used (live): 329.87 GiB
   Number of partitions (estimate): 25460768
   Bloom filter false positives: 156942
   Bloom filter false ratio: 0.00010
   Bloom filter space used: 54.9 MiB
   Compacted partition maximum bytes: 20924300

Thanks for any advice,

Martin


Re: TWCS + subrange repair = excessive re-compaction?

2018-09-25 Thread Martin Mačura
Most partitions in our dataset span one or two SSTables at most.  But
there might be a few that span hundreds of SSTables.  If I located and
deleted them (partition-level tombstone), would this fix the issue?

Thanks,

Martin
On Mon, Sep 24, 2018 at 1:08 PM Jeff Jirsa  wrote:
>
>
>
>
> On Sep 24, 2018, at 3:47 AM, Oleksandr Shulgin  
> wrote:
>
> On Mon, Sep 24, 2018 at 10:50 AM Jeff Jirsa  wrote:
>>
>> Do your partitions span time windows?
>
>
> Yes.
>
>
> The data structure used to know if data needs to be streamed (the merkle 
> tree) is only granular to - at best - a token, so even with subrange repair 
> if a byte is off, it’ll stream the whole partition, including parts of old 
> repaired sstables
>
> Incremental repair is smart enough not to diff or stream already repaired 
> data, the but the matrix of which versions allow subrange AND incremental 
> repair isn’t something I’ve memorized (I know it behaves the way you’d hope 
> in trunk/4.0 after Cassandra-9143)

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: TWCS + subrange repair = excessive re-compaction?

2018-09-24 Thread Martin Mačura
Hi,
I can confirm the same issue in Cassandra 3.11.2.

As an example:  a TWCS table that normally has 800 SSTables  (2 years'
worth of daily windows plus some anticompactions) will peak at
anywhere from 15k to 50k SSTables during a subrange repair.


Regards,

Martin
On Mon, Sep 24, 2018 at 9:34 AM Oleksandr Shulgin
 wrote:
>
> Hello,
>
> Our setup is as follows:
>
> Apache Cassandra: 3.0.17
> Cassandra Reaper: 1.3.0-BETA-20180830
> Compaction: {
>'class': 'TimeWindowCompactionStrategy',
>'compaction_window_size': '30',
>'compaction_window_unit': 'DAYS'
>  }
>
> We have two column families which differ only in the way data is written: one 
> is always with a TTL (of 2 years), the other -- without a TTL.  The data is 
> time-series-like, append-only, no explicit updates or deletes.  The data goes 
> back as far as ~15 months.
>
> We have scheduled a non-incremental repair using Cassandra Reaper to run 
> every week.
>
> Now we are observing an unexpected effect such that often *all* of the 
> SSTable files on disk are modified (touched by repair) for both of the TTLd 
> and non-TTLd tables.
>
> This is not expected, since the old files from past months have been 
> repeatedly repaired a number of times already.
>
> If it is an effect caused by over-streaming, why does Cassandra find any 
> differences in the files from past months in the first place?  We expect that 
> after a file from 2 months ago (or earlier) has been fully repaired once, 
> there is no possibility for any more differences to be discovered.
>
> Is this not a reasonable assumption?
>
> Regards,,
> --
> Alex
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Anticompaction causing significant increase in disk usage

2018-09-12 Thread Martin Mačura
Hi Alain,
thank you for your response.

I'm using incremental repair. I'm afraid subrange repair is not a
viable alternative, because it's very slow - takes over a week to
complete.

I've found at least a partial solution - specifying '-local' or '-dc'
parameter will also disable anticompaction, but the repair will skip
SSTables that are already marked as repaired. Our data is about 50%
repaired, so this significantly reduces repair time.

What if I ran 'sstablerepairedset --really-set --is-repaired' on every
table that was repaired by a subrange repair?  Would it prevent these
tables from being anticompacted, and allow us to use incremental
repair again?


Regards,

Martin
On Wed, Sep 12, 2018 at 1:31 PM Alain RODRIGUEZ  wrote:
>
> Hello Martin,
>
> How do you perform the repairs?
>
> Are you using incremental repairs or full repairs but without subranges? Alex 
> described issues related to these repairs here: 
> http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html.
>
> tl;dr:
>
>> The only way to perform repair without anticompaction in “modern” versions 
>> of Apache Cassandra is subrange repair, which fully skips anticompaction. To 
>> perform a subrange repair correctly, you have three options :
>> - Compute valid token subranges yourself and script repairs accordingly
>> - Use the Cassandra range repair script which performs subrange repair
>> - Use Cassandra Reaper, which also performs subrange repair
>
>
> If you can prevent anti-compaction, disk space growth should be more 
> predictable.
>
> There might be more solutions now out there, C* should also soon be shipped 
> with a side-car it's being actively discussed. Finally, Incremental repairs 
> will receive important fixes in Cassandra 4.0, Alex also wrote about this too 
> (yes, this guy loves repairs ¯\_(ツ)_/¯) 
> http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html
>
> I believe (and hope) this information is relevant to help you fix this issue.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mer. 12 sept. 2018 à 10:14, Martin Mačura  a écrit :
>>
>> Hi,
>> we're on cassandra 3.11.2 . During an anticompaction after repair,
>> TotalDiskSpaceUsed value of one table gradually went from 700GB to
>> 1180GB, and then suddenly jumped back to 700GB.  This happened on all
>> nodes involved in the repair. There was no change in PercentRepaired
>> during or after this process. SSTable count is currently 857, with a
>> peak of 2460 during the repair.
>>
>> Table is using TWCS with 1-day time window.  Most daily SSTables are
>> around 1 GB but the oldest one is 156 GB - caused by a major
>> compaction.
>>
>> system.log.6.zip:INFO  [CompactionExecutor:9923] 2018-09-10
>> 15:29:54,238 CompactionManager.java:649 - [repair
>> #88c36e30-b4cb-11e8-bebe-cd3efd73ed33] Starting anticompaction for ...
>> on 519 [...]  SSTables
>> ...
>> system.log:INFO  [CompactionExecutor:9923] 2018-09-12 00:29:39,262
>> CompactionManager.java:1524 - Anticompaction completed successfully,
>> anticompacted from 0 to 518 sstable(s).
>>
>> What could be the cause of the temporary increase, and how can we
>> prevent it?  We are concerned about running out of disk space soon.
>>
>> Thanks for any help
>>
>> Martin
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Anticompaction causing significant increase in disk usage

2018-09-12 Thread Martin Mačura
Hi,
we're on cassandra 3.11.2 . During an anticompaction after repair,
TotalDiskSpaceUsed value of one table gradually went from 700GB to
1180GB, and then suddenly jumped back to 700GB.  This happened on all
nodes involved in the repair. There was no change in PercentRepaired
during or after this process. SSTable count is currently 857, with a
peak of 2460 during the repair.

Table is using TWCS with 1-day time window.  Most daily SSTables are
around 1 GB but the oldest one is 156 GB - caused by a major
compaction.

system.log.6.zip:INFO  [CompactionExecutor:9923] 2018-09-10
15:29:54,238 CompactionManager.java:649 - [repair
#88c36e30-b4cb-11e8-bebe-cd3efd73ed33] Starting anticompaction for ...
on 519 [...]  SSTables
...
system.log:INFO  [CompactionExecutor:9923] 2018-09-12 00:29:39,262
CompactionManager.java:1524 - Anticompaction completed successfully,
anticompacted from 0 to 518 sstable(s).

What could be the cause of the temporary increase, and how can we
prevent it?  We are concerned about running out of disk space soon.

Thanks for any help

Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra 3.11 and subrange repairs

2018-07-31 Thread Martin Mačura
I am using this tool with 3.11, had to modify it to make it usable:

https://github.com/BrianGallew/cassandra_range_repair/pull/60

Martin

On Tue, Jul 31, 2018 at 3:44 PM Jean Carlo  wrote:
>
> Hello everyone,
>
> I am just wondering if someone is using this tool to make repairs in 
> cassandra 3.11
>
> https://github.com/BrianGallew/cassandra_range_repair
>
> Or everybody is using cassandra-reaper ? :)
>
> I am willing to use cassandra-reaper soon but meanwhile I will just need to 
> cron the repairs in cluster.
>
>
> Actually, I want to know if cassandra_range_repair works properly in 3.11 
> because its repository is not active so far
>
>
>
> Best greetings
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Infinite loop of single SSTable compactions

2018-07-30 Thread Martin Mačura
Hi Rahul,

the table TTL is 24 months. Oldest data is 22 months, so no
expirations yet.  Compacted partition maximum bytes: 17 GB - yeah, I
know that's not good, but we'll have to wait for the TTL to make it go
away.  More recent partitions are kept under 100 MB by bucketing.

The data model:
CREATE TABLE keyspace.table (
   group int,
   status int,
   bucket timestamp,
   ts timeuuid,
   source int,
...
   PRIMARY KEY ((group, status, bucket), ts, source)
) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)

There are no INSERT statements with the same 'ts' and 'source'
clustering columns.

Regards,

Martin
On Thu, Jul 26, 2018 at 12:16 PM Rahul Singh
 wrote:
>
> Few questions
>
>
> What is your maximumcompactedbytes across the cluster for this table ?
> What’s your TTL ?
> What does your data model look like as in what’s your PK?
>
> Rahul
> On Jul 25, 2018, 1:07 PM -0400, James Shaw , wrote:
>
> nodetool compactionstats  --- see compacting which table
> nodetool cfstats keyspace_name.table_name  --- check partition side, 
> tombstones
>
> go the data file directories:  look the data file size, timestamp,  --- 
> compaction will write to new temp file with _tmplink...,
>
> use sstablemetadata ...    look the largest or oldest one first
>
> of course, other factors may be,  like disk space, etc
> also what are compaction_throughput_mb_per_sec in cassandra.yaml
>
> Hope it is helpful.
>
> Thanks,
>
> James
>
>
>
>
> On Wed, Jul 25, 2018 at 4:18 AM, Martin Mačura  wrote:
>>
>> Hi,
>> we have a table which is being compacted all the time, with no change in 
>> size:
>>
>> Compaction History:
>> compacted_atbytes_inbytes_out   rows_merged
>> 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}
>>
>>   2018-07-25T01:09:47.346 57248063878 57248063878
>> {1:11655}
>>  2018-07-24T20:52:48.652
>> 57248063878 57248063878 {1:11655}
>>
>> 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}
>>
>>   2018-07-24T12:11:00.026 57248063878 57248063878
>> {1:11655}
>>  2018-07-24T07:28:04.686
>> 57248063878 57248063878 {1:11655}
>>
>> 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}
>>
>>   2018-07-23T22:06:17.410 57248137921 57248063878
>> {1:11655}
>>
>> We tried setting unchecked_tombstone_compaction to false, had no effect.
>>
>> The data is a time series, there will be only a handful of cell
>> tombstones present. The table has a TTL, but it'll be least a month
>> before it takes effect.
>>
>> Table properties:
>>AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
>> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
>> 'max_threshold': '32', 'min_threshold': '4',
>> 'unchecked_tombstone_compaction': 'false'}
>>AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>AND crc_check_chance = 1.0
>>AND dclocal_read_repair_chance = 0.0
>>AND default_time_to_live = 63072000
>>AND gc_grace_seconds = 10800
>>AND max_index_interval = 2048
>>AND memtable_flush_period_in_ms = 0
>>AND min_index_interval = 128
>>AND read_repair_chance = 0.0
>>AND speculative_retry = 'NONE';
>>
>> Thanks for any help
>>
>>
>> Martin
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Infinite loop of single SSTable compactions

2018-07-25 Thread Martin Mačura
Hi,
we have a table which is being compacted all the time, with no change in size:

Compaction History:
compacted_atbytes_inbytes_out   rows_merged
2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}

  2018-07-25T01:09:47.346 57248063878 57248063878
{1:11655}
 2018-07-24T20:52:48.652
57248063878 57248063878 {1:11655}

2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}

  2018-07-24T12:11:00.026 57248063878 57248063878
{1:11655}
 2018-07-24T07:28:04.686
57248063878 57248063878 {1:11655}

2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}

  2018-07-23T22:06:17.410 57248137921 57248063878
{1:11655}

We tried setting unchecked_tombstone_compaction to false, had no effect.

The data is a time series, there will be only a handful of cell
tombstones present. The table has a TTL, but it'll be least a month
before it takes effect.

Table properties:
   AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
'max_threshold': '32', 'min_threshold': '4',
'unchecked_tombstone_compaction': 'false'}
   AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
   AND crc_check_chance = 1.0
   AND dclocal_read_repair_chance = 0.0
   AND default_time_to_live = 63072000
   AND gc_grace_seconds = 10800
   AND max_index_interval = 2048
   AND memtable_flush_period_in_ms = 0
   AND min_index_interval = 128
   AND read_repair_chance = 0.0
   AND speculative_retry = 'NONE';

Thanks for any help


Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to identify which table causing Maximum Memory usage limit

2018-06-11 Thread Martin Mačura
Hi,
we've had this issue with large partitions (100 MB and more).  Use
nodetool tablehistograms to find partition sizes for each table.

If you have enough heap space to spare, try increasing this parameter:
file_cache_size_in_mb: 512

There's also the following parameter, but I did not test the impact yet:
buffer_pool_use_heap_if_exhausted: true


Regards,

Martin


On Tue, Jun 5, 2018 at 3:54 PM, learner dba
 wrote:
> Hi,
>
> We see this message often, cluster has multiple keyspaces and column
> families;
> How do I know which CF is causing this?
> Or it could be something else?
> Do we need to worry about this message?
>
> INFO  [CounterMutationStage-1] 2018-06-05 13:36:35,983 NoSpamLogger.java:91
> - Maximum memory usage reached (512.000MiB), cannot allocate chunk of
> 1.000MiB
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Repair slow, "Percent repaired" never updated

2018-06-06 Thread Martin Mačura
P.S.: Here's a corresponding log from the second node:

INFO  [AntiEntropyStage:1] 2018-06-04 13:37:16,409 Validator.java:281
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Sending completed
merkle tree to /14.0.53.234 for asm_log.event
INFO  [StreamReceiveTask:30] 2018-06-04 14:14:28,989
StreamResultFuture.java:187 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Session with /14.0.53.234 is
complete
INFO  [StreamReceiveTask:30] 2018-06-04 14:14:28,990
StreamResultFuture.java:219 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] All sessions completed
INFO  [AntiEntropyStage:1] 2018-06-04 14:14:29,000
ActiveRepairService.java:452 - [repair
#af1aefc0-67c0-11e8-b07c-c365701888e8] Not a global repair, will not
do anticompaction


Why is there no anticompaction if it's an incremental repair?

We have two datacenters currently, this concerns the second one that
we recently brought up (with nodetool rebuild). We cannot do a repair
across datacenters, because nodes in the old DC would run out of disk
space.

Regards,

Martin



On Tue, Jun 5, 2018 at 6:06 PM, Martin Mačura  wrote:
> Hi,
> we're on cassandra 3.11.2, and we're having some issues with repairs.
> They take ages to complete, and some time ago the incremental repair
> stopped working - that is, SSTables are not being marked as repaired,
> even though the repair reports success.
>
> Running a full or incremental repair does not make any difference.
>
> Here's a log of a typical repair (omitted a lot of 'Maximum memory
> usage' messages):
>
> INFO  [Repair-Task-12] 2018-06-04 06:29:50,396 RepairRunnable.java:139
> - Starting repair command #11 (af1aefc0-67c0-11e8-b07c-c365701888e8),
> repairing keyspace prod with repair options (parallelism: parallel,
> primary range: false, incremental: true, job threads: 1,
> ColumnFamilies: [event], dataCenters: [DC1], hosts: [], # of ranges:
> 1280, pull repair: false)
> INFO  [Repair-Task-12] 2018-06-04 06:29:51,497 RepairSession.java:228
> - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] new session: will
> sync /14.0.53.234, /14.0.52.115 on range [...] for asm_log.[event]
> INFO  [Repair#11:1] 2018-06-04 06:29:51,776 RepairJob.java:169 -
> [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Requesting merkle trees
> for event (to [/14.0.52.115, /14.0.53.234])
> INFO  [ValidationExecutor:10] 2018-06-04 06:31:13,859
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
> cannot allocate chunk of 1.000MiB
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-06-04 06:32:01,385
> NoSpamLogger.java:94 - Out of 14 commit log syncs over the past
> 134.02s with average duration of 34.90ms, 2 have exceeded the
> configured commit interval by an average of 60.66ms
> ...
> INFO  [ValidationExecutor:10] 2018-06-04 13:31:19,011
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
> cannot allocate chunk of 1.000MiB
> INFO  [AntiEntropyStage:1] 2018-06-04 13:37:17,357
> RepairSession.java:180 - [repair
> #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
> from /14.0.52.115
> INFO  [ValidationExecutor:10] 2018-06-04 13:46:19,281
> NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
> cannot allocate chunk of 1.000MiB
> INFO  [IndexSummaryManager:1] 2018-06-04 13:57:18,772
> IndexSummaryRedistribution.java:76 - Redistributing index summaries
> INFO  [AntiEntropyStage:1] 2018-06-04 13:58:21,971
> RepairSession.java:180 - [repair
> #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
> from /14.0.53.234
> INFO  [RepairJobTask:4] 2018-06-04 13:58:39,780 SyncTask.java:73 -
> [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Endpoints /14.0.52.115
> and /14.0.53.234 have 15406 range(s) out of sync for event
> INFO  [RepairJobTask:4] 2018-06-04 13:58:39,781 LocalSyncTask.java:71
> - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Performing streaming
> repair of 15406 ranges with /14.0.52.115
> INFO  [RepairJobTask:4] 2018-06-04 13:59:49,075
> StreamResultFuture.java:90 - [Stream
> #6244fd50-67ff-11e8-b07c-c365701888e8] Executing streaming plan for
> Repair
> INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,076
> StreamSession.java:266 - [Stream
> #6244fd50-67ff-11e8-b07c-c365701888e8] Starting streaming to
> /14.0.52.115
> INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,089
> StreamCoordinator.java:264 - [Stream
> #6244fd50-67ff-11e8-b07c-c365701888e8, ID#0] Beginning stream session
> with /14.0.52.115
> INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:01:14,423
> StreamResultFuture.java:173 - [Stream
> #6244fd50-67ff-11e8-b07c-c365701888e8 ID#0] Prepare completed.
> Receiving 321 files(6.238GiB), sending 318 files(6.209GiB)
> WARN  [Service Thread] 2018-06-04 14:12:15,578 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 4095ms.  CMS O

Repair slow, "Percent repaired" never updated

2018-06-05 Thread Martin Mačura
Hi,
we're on cassandra 3.11.2, and we're having some issues with repairs.
They take ages to complete, and some time ago the incremental repair
stopped working - that is, SSTables are not being marked as repaired,
even though the repair reports success.

Running a full or incremental repair does not make any difference.

Here's a log of a typical repair (omitted a lot of 'Maximum memory
usage' messages):

INFO  [Repair-Task-12] 2018-06-04 06:29:50,396 RepairRunnable.java:139
- Starting repair command #11 (af1aefc0-67c0-11e8-b07c-c365701888e8),
repairing keyspace prod with repair options (parallelism: parallel,
primary range: false, incremental: true, job threads: 1,
ColumnFamilies: [event], dataCenters: [DC1], hosts: [], # of ranges:
1280, pull repair: false)
INFO  [Repair-Task-12] 2018-06-04 06:29:51,497 RepairSession.java:228
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] new session: will
sync /14.0.53.234, /14.0.52.115 on range [...] for asm_log.[event]
INFO  [Repair#11:1] 2018-06-04 06:29:51,776 RepairJob.java:169 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Requesting merkle trees
for event (to [/14.0.52.115, /14.0.53.234])
INFO  [ValidationExecutor:10] 2018-06-04 06:31:13,859
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-06-04 06:32:01,385
NoSpamLogger.java:94 - Out of 14 commit log syncs over the past
134.02s with average duration of 34.90ms, 2 have exceeded the
configured commit interval by an average of 60.66ms
...
INFO  [ValidationExecutor:10] 2018-06-04 13:31:19,011
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
INFO  [AntiEntropyStage:1] 2018-06-04 13:37:17,357
RepairSession.java:180 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
from /14.0.52.115
INFO  [ValidationExecutor:10] 2018-06-04 13:46:19,281
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB),
cannot allocate chunk of 1.000MiB
INFO  [IndexSummaryManager:1] 2018-06-04 13:57:18,772
IndexSummaryRedistribution.java:76 - Redistributing index summaries
INFO  [AntiEntropyStage:1] 2018-06-04 13:58:21,971
RepairSession.java:180 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event
from /14.0.53.234
INFO  [RepairJobTask:4] 2018-06-04 13:58:39,780 SyncTask.java:73 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Endpoints /14.0.52.115
and /14.0.53.234 have 15406 range(s) out of sync for event
INFO  [RepairJobTask:4] 2018-06-04 13:58:39,781 LocalSyncTask.java:71
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Performing streaming
repair of 15406 ranges with /14.0.52.115
INFO  [RepairJobTask:4] 2018-06-04 13:59:49,075
StreamResultFuture.java:90 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Executing streaming plan for
Repair
INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,076
StreamSession.java:266 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Starting streaming to
/14.0.52.115
INFO  [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,089
StreamCoordinator.java:264 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8, ID#0] Beginning stream session
with /14.0.52.115
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:01:14,423
StreamResultFuture.java:173 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8 ID#0] Prepare completed.
Receiving 321 files(6.238GiB), sending 318 files(6.209GiB)
WARN  [Service Thread] 2018-06-04 14:12:15,578 GCInspector.java:282 -
ConcurrentMarkSweep GC in 4095ms.  CMS Old Gen: 4086661264 ->
1107272664; Par Eden Space: 503316480 -> 0; Par Survivor Space:
21541464 -> 0
...
WARN  [GossipTasks:1] 2018-06-04 14:12:15,677 FailureDetector.java:288
- Not marking nodes down due to local pause of 5123793157 > 50
INFO  [ScheduledTasks:1] 2018-06-04 14:12:20,611 NoSpamLogger.java:91
- Some operations were slow, details available at debug level
(debug.log)
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,188
StreamResultFuture.java:187 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] Session with /14.0.52.115 is
complete
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190
StreamResultFuture.java:219 - [Stream
#6244fd50-67ff-11e8-b07c-c365701888e8] All sessions completed
INFO  [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190
LocalSyncTask.java:121 - [repair
#afc2ef90-67c0-11e8-b07c-c365701888e8] Sync complete using session
afc2ef90-67c0-11e8-b07c-c365701888e8 between /14.0.52.115 and
/14.0.53.234 on event
INFO  [RepairJobTask:5] 2018-06-04 14:14:29,191 RepairJob.java:143 -
[repair #afc2ef90-67c0-11e8-b07c-c365701888e8] event is fully synced
INFO  [RepairJobTask:5] 2018-06-04 14:14:29,193 RepairSession.java:270
- [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Session completed
successfully



Tablestats:
   SSTable count: 714
   Space used (live): 489416489322
   Space used (total): 489416489322
   Space used by snapshots (total): 0
  

Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2

2018-03-23 Thread Martin Mačura
Nevermind, we resolved the issue  JVM heap settings were misconfigured

Martin

On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m.mac...@gmail.com> wrote:
> Hi all,
>
> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
> it to 3.11.2.
>
> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>
> After the upgrade, clients started reporting connection issues:
>
> cassandra | [ERROR] Closing established connection pool to host
>  because of the following error: Read error 'connection
> reset by peer' (src/pool.cpp:384)
> cassandra | [ERROR] Unable to establish a control connection to host
>  because of the following error: Error: 'Request timed out'
> (0x010E) (src/control_connection.cpp:263)
>
>
> Cassandra logs are full of garbage collection warnings:
>
> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
> 1774446288; Par Survivor Space: 858980344 -> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
> Pool NameActive   Pending  Completed   Blocked
>  All Time Blocked
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> MutationStage10 92526002 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ViewMutationStage 0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ReadStage 2 2 943544 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> RequestResponseStage  0 01666876 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> ReadRepairStage   0 0  10362 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CounterMutationStage  0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MiscStage 0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CompactionExecutor0 0   3076 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MemtableReclaimMemory 0 0 44 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> PendingRangeCalculator0 0  4 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> GossipStage   0 0  14287 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> SecondaryIndexManagement  0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> HintsDispatcher   0 0  1 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_1 0 0 37
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_2 0 0 37
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> MigrationStage0 0  2 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> MemtablePostFlush 0 0 72 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_0 0 0 44
>  0 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> ValidationExecutor0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> Sampler   0 0  0 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> MemtableFlushWriter   0 0 44 0
> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_5 0 

Nodes unresponsive after upgrade 3.9 -> 3.11.2

2018-03-23 Thread Martin Mačura
Hi all,

We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
it to 3.11.2.

Each node has 32 GB RAM, 8 GB Cassandra heap size.

After the upgrade, clients started reporting connection issues:

cassandra | [ERROR] Closing established connection pool to host
 because of the following error: Read error 'connection
reset by peer' (src/pool.cpp:384)
cassandra | [ERROR] Unable to establish a control connection to host
 because of the following error: Error: 'Request timed out'
(0x010E) (src/control_connection.cpp:263)


Cassandra logs are full of garbage collection warnings:

WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
1774446288; Par Survivor Space: 858980344 -> 0
INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
Pool NameActive   Pending  Completed   Blocked
 All Time Blocked
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
MutationStage10 92526002 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ViewMutationStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ReadStage 2 2 943544 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
RequestResponseStage  0 01666876 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
ReadRepairStage   0 0  10362 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CounterMutationStage  0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MiscStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CompactionExecutor0 0   3076 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MemtableReclaimMemory 0 0 44 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
PendingRangeCalculator0 0  4 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
GossipStage   0 0  14287 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
SecondaryIndexManagement  0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
HintsDispatcher   0 0  1 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_1 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_2 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
MigrationStage0 0  2 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
MemtablePostFlush 0 0 72 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_0 0 0 44
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
ValidationExecutor0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
Sampler   0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
MemtableFlushWriter   0 0 44 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_5 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
InternalResponseStage 0 0  0 0
0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_3 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_4 0 0 37
 0 0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
AntiEntropyStage  0  

Re: Rebuild to a new DC fails every time

2018-01-11 Thread Martin Mačura
Thanks for the tips, Alan.  The cluster is entirely healthy. But the
connection between DCs is a VPN, managed by a third party - it is
possible it might be flaky. However, I would expect the rebuild job to
be able to recover from connection timeout/reset type of errors
without a need for manual intervention.

In the end we opted for restore from snapshot + repair, to bring up
the node in the new DC.  We'll see how that goes.

Regards,

Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Rebuild to a new DC fails every time

2018-01-08 Thread Martin Mačura
None of the files is listed more than once in the logs:

java.lang.RuntimeException: Transfer of file
/fs3/cassandra/data//event_group-3b5782d08e4411e6842917253f111990/mc-116042-big-Data.db
already completed or aborted (perhaps session failed?).
java.lang.RuntimeException: Transfer of file
/fs0/cassandra/data//event_group-3b5782d08e4411e6842917253f111990/mc-111370-big-Data.db
already completed or aborted (perhaps session failed?).
java.lang.RuntimeException: Transfer of file
/fs3/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
already completed or aborted (perhaps session failed?).
java.lang.RuntimeException: Transfer of file
/fs4/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-9133-big-Data.db
already completed or aborted (perhaps session failed?).
java.lang.RuntimeException: Transfer of file
/fs2/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-3997-big-Data.db
already completed or aborted (perhaps session failed?).
java.lang.RuntimeException: Transfer of file
/fs1/cassandra/data///event_group-3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db
already completed or aborted (perhaps session failed?).




On Mon, Jan 8, 2018 at 2:21 AM, kurt greaves <k...@instaclustr.com> wrote:
> If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms is
> 48 hours. Appears rebuild is trying to stream the same file twice. Are there
> other exceptions in the logs related to the file, or can you find out if
> it's previously been sent by the same session? Search the logs for the file
> that failed and post back any exceptions.
>
> On 29 December 2017 at 10:18, Martin Mačura <m.mac...@gmail.com> wrote:
>>
>> Is this something that can be resolved by CASSANDRA-11841 ?
>>
>> Thanks,
>>
>> Martin
>>
>> On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.mac...@gmail.com> wrote:
>> > Hi all,
>> > we are trying to add a new datacenter to the existing cluster, but the
>> > 'nodetool rebuild' command always fails after a couple of hours.
>> >
>> > We're on Cassandra 3.9.
>> >
>> > Example 1:
>> >
>> > 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:55735] 2017-12-13
>> > 23:55:38,840 StreamResultFuture.java:174 - [Stream
>> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
>> > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB)
>> > 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-13
>> > 23:55:38,858 StreamResultFuture.java:174 - [Stream
>> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
>> > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB)
>> >
>> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14
>> > 04:28:09,064 StreamSession.java:533 - [Stream
>> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
>> > session with peer 172.25.16.125
>> > 172.24.16.169 java.io.IOException: Connection reset by peer
>> >
>> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14
>> > 07:26:26,832 StreamSession.java:533 - [Stream
>> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
>> > session with peer 172.25.16.125
>> > 172.24.16.169 java.lang.RuntimeException: Transfer of file
>> > -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
>> > already completed or aborted (perhaps session failed?).
>> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14
>> > 07:26:50,004 StreamSession.java:533 - [Stream
>> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
>> > session with peer 172.24.16.169
>> > 172.25.16.125 java.io.IOException: Connection reset by peer
>> >
>> > Example 2:
>> >
>> > 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:35202] 2017-12-18
>> > 03:24:31,423 StreamResultFuture.java:174 - [Stream
>> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
>> > Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB)
>> > 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-18
>> > 03:24:31,441 StreamResultFuture.java:174 - [Stream
>> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
>> > Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB)
>> >
>> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18
>> > 06:39:42,049 StreamSession.java:533 - [Stream
>> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
>> > session with peer 172.25.16.125
>> > 172.24.16.169 java.io.IOException: Connection

Re: Rebuild to a new DC fails every time

2017-12-29 Thread Martin Mačura
Is this something that can be resolved by CASSANDRA-11841 ?

Thanks,

Martin

On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.mac...@gmail.com> wrote:
> Hi all,
> we are trying to add a new datacenter to the existing cluster, but the
> 'nodetool rebuild' command always fails after a couple of hours.
>
> We're on Cassandra 3.9.
>
> Example 1:
>
> 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:55735] 2017-12-13
> 23:55:38,840 StreamResultFuture.java:174 - [Stream
> #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB)
> 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-13
> 23:55:38,858 StreamResultFuture.java:174 - [Stream
> #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB)
>
> 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14
> 04:28:09,064 StreamSession.java:533 - [Stream
> #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> session with peer 172.25.16.125
> 172.24.16.169 java.io.IOException: Connection reset by peer
>
> 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14
> 07:26:26,832 StreamSession.java:533 - [Stream
> #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> session with peer 172.25.16.125
> 172.24.16.169 java.lang.RuntimeException: Transfer of file
> -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
> already completed or aborted (perhaps session failed?).
> 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14
> 07:26:50,004 StreamSession.java:533 - [Stream
> #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> session with peer 172.24.16.169
> 172.25.16.125 java.io.IOException: Connection reset by peer
>
> Example 2:
>
> 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:35202] 2017-12-18
> 03:24:31,423 StreamResultFuture.java:174 - [Stream
> #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
> Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB)
> 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-18
> 03:24:31,441 StreamResultFuture.java:174 - [Stream
> #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
> Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB)
>
> 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18
> 06:39:42,049 StreamSession.java:533 - [Stream
> #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> session with peer 172.25.16.125
> 172.24.16.169 java.io.IOException: Connection reset by peer
>
> 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18
> 09:25:36,188 StreamSession.java:533 - [Stream
> #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> session with peer 172.25.16.125
> 172.24.16.169 java.lang.RuntimeException: Transfer of file
> -3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db
> already completed or aborted (perhaps session failed?).
> 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18
> 09:25:59,447 StreamSession.java:533 - [Stream
> #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> session with peer 172.24.16.169
> 172.25.16.125 java.io.IOException: Connection timed out
>
> Datacenter: PRIMARY
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> UN  172.24.16.169  918.31 GiB  256  100.0%
> bc4a980b-cca6-4ca2-b32f-f8206d48e14c  RAC1
> UN  172.24.16.170  908.76 GiB  256  100.0%
> 37b2742e-c83a-4341-896f-09d244810e69  RAC1
> UN  172.24.16.171  908.44 GiB  256  100.0%
> 6dc2b9d8-75dd-48f8-858c-53b1af42e8fb  RAC1
> Datacenter: SECONDARY
> =
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> UN  172.25.16.125  27.48 GiB  256  100.0%
> 1e1669eb-cfd2-4718-a073-558946a8c947  RAC2
> UN  172.25.16.124  28.24 GiB  256  100.0%
> 896d9894-10c8-4269-9476-5ddab3c8abe9  RAC2
>
> Any ideas?
>
> Thanks,
>
> Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Rebuild to a new DC fails every time

2017-12-21 Thread Martin Mačura
Hi all,
we are trying to add a new datacenter to the existing cluster, but the
'nodetool rebuild' command always fails after a couple of hours.

We're on Cassandra 3.9.

Example 1:

172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:55735] 2017-12-13
23:55:38,840 StreamResultFuture.java:174 - [Stream
#b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB)
172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-13
23:55:38,858 StreamResultFuture.java:174 - [Stream
#b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB)

172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14
04:28:09,064 StreamSession.java:533 - [Stream
#b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
session with peer 172.25.16.125
172.24.16.169 java.io.IOException: Connection reset by peer

172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14
07:26:26,832 StreamSession.java:533 - [Stream
#b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
session with peer 172.25.16.125
172.24.16.169 java.lang.RuntimeException: Transfer of file
-13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
already completed or aborted (perhaps session failed?).
172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14
07:26:50,004 StreamSession.java:533 - [Stream
#b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
session with peer 172.24.16.169
172.25.16.125 java.io.IOException: Connection reset by peer

Example 2:

172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:35202] 2017-12-18
03:24:31,423 StreamResultFuture.java:174 - [Stream
#95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB)
172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-18
03:24:31,441 StreamResultFuture.java:174 - [Stream
#95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB)

172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18
06:39:42,049 StreamSession.java:533 - [Stream
#95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
session with peer 172.25.16.125
172.24.16.169 java.io.IOException: Connection reset by peer

172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18
09:25:36,188 StreamSession.java:533 - [Stream
#95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
session with peer 172.25.16.125
172.24.16.169 java.lang.RuntimeException: Transfer of file
-3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db
already completed or aborted (perhaps session failed?).
172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18
09:25:59,447 StreamSession.java:533 - [Stream
#95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
session with peer 172.24.16.169
172.25.16.125 java.io.IOException: Connection timed out

Datacenter: PRIMARY
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
Rack
UN  172.24.16.169  918.31 GiB  256  100.0%
bc4a980b-cca6-4ca2-b32f-f8206d48e14c  RAC1
UN  172.24.16.170  908.76 GiB  256  100.0%
37b2742e-c83a-4341-896f-09d244810e69  RAC1
UN  172.24.16.171  908.44 GiB  256  100.0%
6dc2b9d8-75dd-48f8-858c-53b1af42e8fb  RAC1
Datacenter: SECONDARY
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
Rack
UN  172.25.16.125  27.48 GiB  256  100.0%
1e1669eb-cfd2-4718-a073-558946a8c947  RAC2
UN  172.25.16.124  28.24 GiB  256  100.0%
896d9894-10c8-4269-9476-5ddab3c8abe9  RAC2

Any ideas?

Thanks,

Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org