Re: Bloom filter false positives high
I've decreased bloom_filter_fp_chance from 0.01 to 0.001. The sstableupgrade took 3 days to complete. And this is a result: node1 Bloom filter false positives: 380965 Bloom filter false ratio: 0.46560 Bloom filter space used: 27.1 MiB Bloom filter off heap memory used: 27.09 MiB node2 Bloom filter false positives: 866636 Bloom filter false ratio: 0.40865 Bloom filter space used: 27.78 MiB Bloom filter off heap memory used: 27.77 MiB node3 Bloom filter false positives: 433296 Bloom filter false ratio: 0.20359 Bloom filter space used: 26.15 MiB Bloom filter off heap memory used: 26.15 MiB node4 Bloom filter false positives: 550721 Bloom filter false ratio: 0.30233 Bloom filter space used: 24.7 MiB Bloom filter off heap memory used: 24.7 MiB Martin On Wed, Apr 17, 2019 at 1:45 PM Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > Lastly I wonder if that number is very same from every node you > connect your nodetool to. Do all nodes see very similar false > positives ratio / number? > > On Wed, 17 Apr 2019 at 21:41, Stefan Miklosovic > wrote: > > > > One thing comes to my mind but my reasoning is questionable as I am > > not an expert in this. > > > > If you think about this, the whole concept of Bloom filter is to check > > if some record is in particular SSTable. False positive mean that, > > obviously, filter thought it was there but in fact it is not. So > > Cassandra did a look unnecessarily. Why does it think that it is there > > in such number of cases? You either make a lot of same requests on > > same partition key over time hence querying same data over and over > > again (but would not that data be cached?) or there was a lot of data > > written with same partition key so it thinks it is there but > > clustering column is different. As ts is of type timeuuid, isnt it > > true that you are doing a lot of queries with some date? It might be > > true that hash is done only on partition keys and not on clustering > > columns so filter gives you "yes" and it goes there, checks it > > clustering column is equal what you queried and its not there. But as > > I say I might be wrong ... > > > > More to it, your read_repair_chance is 0.0 so it will never do a > > repair after successful read (e.g. you have rf 3 and cl quorum so one > > node is somehow behind) so if you dont run repairs maybe it is just > > somehow unsychronized but that is really just my guess. > > > > On Wed, 17 Apr 2019 at 21:39, Martin Mačura wrote: > > > > > > We cannot run any repairs on these tables. Whenever we tried it > (incremental or full or partitioner range), it caused a node to run out of > disk space during anticompaction. We'll try again once Cassandra 4.0 is > released. > > > > > > On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > > >> > > >> if you invoke nodetool it gets false positives number from this metric > > >> > > >> > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 > > >> > > >> You get high false positives so this accumulates them > > >> > > >> > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572 > > >> > > >> If you follow that, that number is computed here > > >> > > >> > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55 > > >> > > >> In order to have that number so high, the difference has to be so big > > >> so lastFalsePositiveCount is imho significantly lower > > >> > > >> False positives are ever increased only in BigTableReader where it get > > >> complicated very quickly and I am not sure why it is called to be > > >> honest. > > >> > > >> Is all fine with db as such? Do you run repairs? Does that number > > >> increses or decreases over time? Has repair or compaction some effect > > >> on it? > > >> > > >> On Wed, 17 Apr 2019 at 20:48, Martin Mačura > wrote: > > >> > > > >> > Both tables use the default bloom_filter_fp_chance of 0.01 ... > > >> > > > >>
Re: Bloom filter false positives high
We cannot run any repairs on these tables. Whenever we tried it (incremental or full or partitioner range), it caused a node to run out of disk space during anticompaction. We'll try again once Cassandra 4.0 is released. On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > if you invoke nodetool it gets false positives number from this metric > > > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 > > You get high false positives so this accumulates them > > > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572 > > If you follow that, that number is computed here > > > https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55 > > In order to have that number so high, the difference has to be so big > so lastFalsePositiveCount is imho significantly lower > > False positives are ever increased only in BigTableReader where it get > complicated very quickly and I am not sure why it is called to be > honest. > > Is all fine with db as such? Do you run repairs? Does that number > increses or decreases over time? Has repair or compaction some effect > on it? > > On Wed, 17 Apr 2019 at 20:48, Martin Mačura wrote: > > > > Both tables use the default bloom_filter_fp_chance of 0.01 ... > > > > CREATE TABLE ... ( > >a int, > >b int, > >bucket timestamp, > >ts timeuuid, > >c int, > > ... > >PRIMARY KEY ((a, b, bucket), ts, c) > > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) > >AND bloom_filter_fp_chance = 0.01 > >AND compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': > > 'false'} > >AND dclocal_read_repair_chance = 0.0 > >AND default_time_to_live = 63072000 > >AND gc_grace_seconds = 10800 > > ... > >AND read_repair_chance = 0.0 > >AND speculative_retry = 'NONE'; > > > > > > CREATE TABLE ... ( > >c int, > >b int, > >bucket timestamp, > >ts timeuuid, > > ... > >PRIMARY KEY ((c, b, bucket), ts) > > ) WITH CLUSTERING ORDER BY (ts DESC) > >AND bloom_filter_fp_chance = 0.01 > >AND compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': > > 'false'} > >AND dclocal_read_repair_chance = 0.0 > >AND default_time_to_live = 63072000 > >AND gc_grace_seconds = 10800 > > ... > >AND read_repair_chance = 0.0 > >AND speculative_retry = 'NONE'; > > > > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > >> > >> What is your bloom_filter_fp_chance for either table? I guess it is > >> bigger for the first one, bigger that number is between 0 and 1, less > >> memory it will use (17 MiB against 54.9 Mib) which means more false > >> positives you will get. > >> > >> On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: > >> > > >> > Hi, > >> > I have a table with poor bloom filter false ratio: > >> >SSTable count: 1223 > >> >Space used (live): 726.58 GiB > >> >Number of partitions (estimate): 8592749 > >> >Bloom filter false positives: 35796352 > >> >Bloom filter false ratio: 0.68472 > >> >Bloom filter space used: 17.82 MiB > >> >Compacted partition maximum bytes: 386857368 > >> > > >> > It's a time series, TWCS compaction, window size 1 day, data > partitioned in daily buckets, TTL 2 years. > >> > > >> > I have another table with a similar schema, but it is not affected > for some reason: > >> >SSTable count: 1114 > >> >Space used (live): 329.87 GiB > >> >Number of partitions (estimate): 25460768 > >> >Bloom filter false positives: 156942 > >> >Bloom filter false ratio: 0.00010 > >> >Bloom filter space used: 54.9 MiB > >> >Compacted partition maximum bytes: 20924300 > >> > > >> > Thanks for any advice, > >> > > >> > Martin > >> > >> - > >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: user-h...@cassandra.apache.org > >> > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Bloom filter false positives high
Both tables use the default bloom_filter_fp_chance of 0.01 ... CREATE TABLE ... ( a int, b int, bucket timestamp, ts timeuuid, c int, ... PRIMARY KEY ((a, b, bucket), ts, c) ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) AND bloom_filter_fp_chance = 0.01 AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': 'false'} AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 63072000 AND gc_grace_seconds = 10800 ... AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; CREATE TABLE ... ( c int, b int, bucket timestamp, ts timeuuid, ... PRIMARY KEY ((c, b, bucket), ts) ) WITH CLUSTERING ORDER BY (ts DESC) AND bloom_filter_fp_chance = 0.01 AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': 'false'} AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 63072000 AND gc_grace_seconds = 10800 ... AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > What is your bloom_filter_fp_chance for either table? I guess it is > bigger for the first one, bigger that number is between 0 and 1, less > memory it will use (17 MiB against 54.9 Mib) which means more false > positives you will get. > > On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: > > > > Hi, > > I have a table with poor bloom filter false ratio: > >SSTable count: 1223 > >Space used (live): 726.58 GiB > >Number of partitions (estimate): 8592749 > >Bloom filter false positives: 35796352 > >Bloom filter false ratio: 0.68472 > >Bloom filter space used: 17.82 MiB > >Compacted partition maximum bytes: 386857368 > > > > It's a time series, TWCS compaction, window size 1 day, data partitioned > in daily buckets, TTL 2 years. > > > > I have another table with a similar schema, but it is not affected for > some reason: > >SSTable count: 1114 > >Space used (live): 329.87 GiB > >Number of partitions (estimate): 25460768 > >Bloom filter false positives: 156942 > >Bloom filter false ratio: 0.00010 > >Bloom filter space used: 54.9 MiB > >Compacted partition maximum bytes: 20924300 > > > > Thanks for any advice, > > > > Martin > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Bloom filter false positives high
Hi, I have a table with poor bloom filter false ratio: SSTable count: 1223 Space used (live): 726.58 GiB Number of partitions (estimate): 8592749 Bloom filter false positives: 35796352 Bloom filter false ratio: 0.68472 Bloom filter space used: 17.82 MiB Compacted partition maximum bytes: 386857368 It's a time series, TWCS compaction, window size 1 day, data partitioned in daily buckets, TTL 2 years. I have another table with a similar schema, but it is not affected for some reason: SSTable count: 1114 Space used (live): 329.87 GiB Number of partitions (estimate): 25460768 Bloom filter false positives: 156942 Bloom filter false ratio: 0.00010 Bloom filter space used: 54.9 MiB Compacted partition maximum bytes: 20924300 Thanks for any advice, Martin
Re: TWCS + subrange repair = excessive re-compaction?
Most partitions in our dataset span one or two SSTables at most. But there might be a few that span hundreds of SSTables. If I located and deleted them (partition-level tombstone), would this fix the issue? Thanks, Martin On Mon, Sep 24, 2018 at 1:08 PM Jeff Jirsa wrote: > > > > > On Sep 24, 2018, at 3:47 AM, Oleksandr Shulgin > wrote: > > On Mon, Sep 24, 2018 at 10:50 AM Jeff Jirsa wrote: >> >> Do your partitions span time windows? > > > Yes. > > > The data structure used to know if data needs to be streamed (the merkle > tree) is only granular to - at best - a token, so even with subrange repair > if a byte is off, it’ll stream the whole partition, including parts of old > repaired sstables > > Incremental repair is smart enough not to diff or stream already repaired > data, the but the matrix of which versions allow subrange AND incremental > repair isn’t something I’ve memorized (I know it behaves the way you’d hope > in trunk/4.0 after Cassandra-9143) - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: TWCS + subrange repair = excessive re-compaction?
Hi, I can confirm the same issue in Cassandra 3.11.2. As an example: a TWCS table that normally has 800 SSTables (2 years' worth of daily windows plus some anticompactions) will peak at anywhere from 15k to 50k SSTables during a subrange repair. Regards, Martin On Mon, Sep 24, 2018 at 9:34 AM Oleksandr Shulgin wrote: > > Hello, > > Our setup is as follows: > > Apache Cassandra: 3.0.17 > Cassandra Reaper: 1.3.0-BETA-20180830 > Compaction: { >'class': 'TimeWindowCompactionStrategy', >'compaction_window_size': '30', >'compaction_window_unit': 'DAYS' > } > > We have two column families which differ only in the way data is written: one > is always with a TTL (of 2 years), the other -- without a TTL. The data is > time-series-like, append-only, no explicit updates or deletes. The data goes > back as far as ~15 months. > > We have scheduled a non-incremental repair using Cassandra Reaper to run > every week. > > Now we are observing an unexpected effect such that often *all* of the > SSTable files on disk are modified (touched by repair) for both of the TTLd > and non-TTLd tables. > > This is not expected, since the old files from past months have been > repeatedly repaired a number of times already. > > If it is an effect caused by over-streaming, why does Cassandra find any > differences in the files from past months in the first place? We expect that > after a file from 2 months ago (or earlier) has been fully repaired once, > there is no possibility for any more differences to be discovered. > > Is this not a reasonable assumption? > > Regards,, > -- > Alex > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Anticompaction causing significant increase in disk usage
Hi Alain, thank you for your response. I'm using incremental repair. I'm afraid subrange repair is not a viable alternative, because it's very slow - takes over a week to complete. I've found at least a partial solution - specifying '-local' or '-dc' parameter will also disable anticompaction, but the repair will skip SSTables that are already marked as repaired. Our data is about 50% repaired, so this significantly reduces repair time. What if I ran 'sstablerepairedset --really-set --is-repaired' on every table that was repaired by a subrange repair? Would it prevent these tables from being anticompacted, and allow us to use incremental repair again? Regards, Martin On Wed, Sep 12, 2018 at 1:31 PM Alain RODRIGUEZ wrote: > > Hello Martin, > > How do you perform the repairs? > > Are you using incremental repairs or full repairs but without subranges? Alex > described issues related to these repairs here: > http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html. > > tl;dr: > >> The only way to perform repair without anticompaction in “modern” versions >> of Apache Cassandra is subrange repair, which fully skips anticompaction. To >> perform a subrange repair correctly, you have three options : >> - Compute valid token subranges yourself and script repairs accordingly >> - Use the Cassandra range repair script which performs subrange repair >> - Use Cassandra Reaper, which also performs subrange repair > > > If you can prevent anti-compaction, disk space growth should be more > predictable. > > There might be more solutions now out there, C* should also soon be shipped > with a side-car it's being actively discussed. Finally, Incremental repairs > will receive important fixes in Cassandra 4.0, Alex also wrote about this too > (yes, this guy loves repairs ¯\_(ツ)_/¯) > http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html > > I believe (and hope) this information is relevant to help you fix this issue. > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le mer. 12 sept. 2018 à 10:14, Martin Mačura a écrit : >> >> Hi, >> we're on cassandra 3.11.2 . During an anticompaction after repair, >> TotalDiskSpaceUsed value of one table gradually went from 700GB to >> 1180GB, and then suddenly jumped back to 700GB. This happened on all >> nodes involved in the repair. There was no change in PercentRepaired >> during or after this process. SSTable count is currently 857, with a >> peak of 2460 during the repair. >> >> Table is using TWCS with 1-day time window. Most daily SSTables are >> around 1 GB but the oldest one is 156 GB - caused by a major >> compaction. >> >> system.log.6.zip:INFO [CompactionExecutor:9923] 2018-09-10 >> 15:29:54,238 CompactionManager.java:649 - [repair >> #88c36e30-b4cb-11e8-bebe-cd3efd73ed33] Starting anticompaction for ... >> on 519 [...] SSTables >> ... >> system.log:INFO [CompactionExecutor:9923] 2018-09-12 00:29:39,262 >> CompactionManager.java:1524 - Anticompaction completed successfully, >> anticompacted from 0 to 518 sstable(s). >> >> What could be the cause of the temporary increase, and how can we >> prevent it? We are concerned about running out of disk space soon. >> >> Thanks for any help >> >> Martin >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Anticompaction causing significant increase in disk usage
Hi, we're on cassandra 3.11.2 . During an anticompaction after repair, TotalDiskSpaceUsed value of one table gradually went from 700GB to 1180GB, and then suddenly jumped back to 700GB. This happened on all nodes involved in the repair. There was no change in PercentRepaired during or after this process. SSTable count is currently 857, with a peak of 2460 during the repair. Table is using TWCS with 1-day time window. Most daily SSTables are around 1 GB but the oldest one is 156 GB - caused by a major compaction. system.log.6.zip:INFO [CompactionExecutor:9923] 2018-09-10 15:29:54,238 CompactionManager.java:649 - [repair #88c36e30-b4cb-11e8-bebe-cd3efd73ed33] Starting anticompaction for ... on 519 [...] SSTables ... system.log:INFO [CompactionExecutor:9923] 2018-09-12 00:29:39,262 CompactionManager.java:1524 - Anticompaction completed successfully, anticompacted from 0 to 518 sstable(s). What could be the cause of the temporary increase, and how can we prevent it? We are concerned about running out of disk space soon. Thanks for any help Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cassandra 3.11 and subrange repairs
I am using this tool with 3.11, had to modify it to make it usable: https://github.com/BrianGallew/cassandra_range_repair/pull/60 Martin On Tue, Jul 31, 2018 at 3:44 PM Jean Carlo wrote: > > Hello everyone, > > I am just wondering if someone is using this tool to make repairs in > cassandra 3.11 > > https://github.com/BrianGallew/cassandra_range_repair > > Or everybody is using cassandra-reaper ? :) > > I am willing to use cassandra-reaper soon but meanwhile I will just need to > cron the repairs in cluster. > > > Actually, I want to know if cassandra_range_repair works properly in 3.11 > because its repository is not active so far > > > > Best greetings > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Infinite loop of single SSTable compactions
Hi Rahul, the table TTL is 24 months. Oldest data is 22 months, so no expirations yet. Compacted partition maximum bytes: 17 GB - yeah, I know that's not good, but we'll have to wait for the TTL to make it go away. More recent partitions are kept under 100 MB by bucketing. The data model: CREATE TABLE keyspace.table ( group int, status int, bucket timestamp, ts timeuuid, source int, ... PRIMARY KEY ((group, status, bucket), ts, source) ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) There are no INSERT statements with the same 'ts' and 'source' clustering columns. Regards, Martin On Thu, Jul 26, 2018 at 12:16 PM Rahul Singh wrote: > > Few questions > > > What is your maximumcompactedbytes across the cluster for this table ? > What’s your TTL ? > What does your data model look like as in what’s your PK? > > Rahul > On Jul 25, 2018, 1:07 PM -0400, James Shaw , wrote: > > nodetool compactionstats --- see compacting which table > nodetool cfstats keyspace_name.table_name --- check partition side, > tombstones > > go the data file directories: look the data file size, timestamp, --- > compaction will write to new temp file with _tmplink..., > > use sstablemetadata ... look the largest or oldest one first > > of course, other factors may be, like disk space, etc > also what are compaction_throughput_mb_per_sec in cassandra.yaml > > Hope it is helpful. > > Thanks, > > James > > > > > On Wed, Jul 25, 2018 at 4:18 AM, Martin Mačura wrote: >> >> Hi, >> we have a table which is being compacted all the time, with no change in >> size: >> >> Compaction History: >> compacted_atbytes_inbytes_out rows_merged >> 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655} >> >> 2018-07-25T01:09:47.346 57248063878 57248063878 >> {1:11655} >> 2018-07-24T20:52:48.652 >> 57248063878 57248063878 {1:11655} >> >> 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655} >> >> 2018-07-24T12:11:00.026 57248063878 57248063878 >> {1:11655} >> 2018-07-24T07:28:04.686 >> 57248063878 57248063878 {1:11655} >> >> 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655} >> >> 2018-07-23T22:06:17.410 57248137921 57248063878 >> {1:11655} >> >> We tried setting unchecked_tombstone_compaction to false, had no effect. >> >> The data is a time series, there will be only a handful of cell >> tombstones present. The table has a TTL, but it'll be least a month >> before it takes effect. >> >> Table properties: >>AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', >> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', >> 'max_threshold': '32', 'min_threshold': '4', >> 'unchecked_tombstone_compaction': 'false'} >>AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>AND crc_check_chance = 1.0 >>AND dclocal_read_repair_chance = 0.0 >>AND default_time_to_live = 63072000 >>AND gc_grace_seconds = 10800 >>AND max_index_interval = 2048 >>AND memtable_flush_period_in_ms = 0 >>AND min_index_interval = 128 >>AND read_repair_chance = 0.0 >>AND speculative_retry = 'NONE'; >> >> Thanks for any help >> >> >> Martin >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Infinite loop of single SSTable compactions
Hi, we have a table which is being compacted all the time, with no change in size: Compaction History: compacted_atbytes_inbytes_out rows_merged 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655} 2018-07-25T01:09:47.346 57248063878 57248063878 {1:11655} 2018-07-24T20:52:48.652 57248063878 57248063878 {1:11655} 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655} 2018-07-24T12:11:00.026 57248063878 57248063878 {1:11655} 2018-07-24T07:28:04.686 57248063878 57248063878 {1:11655} 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655} 2018-07-23T22:06:17.410 57248137921 57248063878 {1:11655} We tried setting unchecked_tombstone_compaction to false, had no effect. The data is a time series, there will be only a handful of cell tombstones present. The table has a TTL, but it'll be least a month before it takes effect. Table properties: AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'max_threshold': '32', 'min_threshold': '4', 'unchecked_tombstone_compaction': 'false'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 63072000 AND gc_grace_seconds = 10800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; Thanks for any help Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: How to identify which table causing Maximum Memory usage limit
Hi, we've had this issue with large partitions (100 MB and more). Use nodetool tablehistograms to find partition sizes for each table. If you have enough heap space to spare, try increasing this parameter: file_cache_size_in_mb: 512 There's also the following parameter, but I did not test the impact yet: buffer_pool_use_heap_if_exhausted: true Regards, Martin On Tue, Jun 5, 2018 at 3:54 PM, learner dba wrote: > Hi, > > We see this message often, cluster has multiple keyspaces and column > families; > How do I know which CF is causing this? > Or it could be something else? > Do we need to worry about this message? > > INFO [CounterMutationStage-1] 2018-06-05 13:36:35,983 NoSpamLogger.java:91 > - Maximum memory usage reached (512.000MiB), cannot allocate chunk of > 1.000MiB > > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Repair slow, "Percent repaired" never updated
P.S.: Here's a corresponding log from the second node: INFO [AntiEntropyStage:1] 2018-06-04 13:37:16,409 Validator.java:281 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Sending completed merkle tree to /14.0.53.234 for asm_log.event INFO [StreamReceiveTask:30] 2018-06-04 14:14:28,989 StreamResultFuture.java:187 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] Session with /14.0.53.234 is complete INFO [StreamReceiveTask:30] 2018-06-04 14:14:28,990 StreamResultFuture.java:219 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] All sessions completed INFO [AntiEntropyStage:1] 2018-06-04 14:14:29,000 ActiveRepairService.java:452 - [repair #af1aefc0-67c0-11e8-b07c-c365701888e8] Not a global repair, will not do anticompaction Why is there no anticompaction if it's an incremental repair? We have two datacenters currently, this concerns the second one that we recently brought up (with nodetool rebuild). We cannot do a repair across datacenters, because nodes in the old DC would run out of disk space. Regards, Martin On Tue, Jun 5, 2018 at 6:06 PM, Martin Mačura wrote: > Hi, > we're on cassandra 3.11.2, and we're having some issues with repairs. > They take ages to complete, and some time ago the incremental repair > stopped working - that is, SSTables are not being marked as repaired, > even though the repair reports success. > > Running a full or incremental repair does not make any difference. > > Here's a log of a typical repair (omitted a lot of 'Maximum memory > usage' messages): > > INFO [Repair-Task-12] 2018-06-04 06:29:50,396 RepairRunnable.java:139 > - Starting repair command #11 (af1aefc0-67c0-11e8-b07c-c365701888e8), > repairing keyspace prod with repair options (parallelism: parallel, > primary range: false, incremental: true, job threads: 1, > ColumnFamilies: [event], dataCenters: [DC1], hosts: [], # of ranges: > 1280, pull repair: false) > INFO [Repair-Task-12] 2018-06-04 06:29:51,497 RepairSession.java:228 > - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] new session: will > sync /14.0.53.234, /14.0.52.115 on range [...] for asm_log.[event] > INFO [Repair#11:1] 2018-06-04 06:29:51,776 RepairJob.java:169 - > [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Requesting merkle trees > for event (to [/14.0.52.115, /14.0.53.234]) > INFO [ValidationExecutor:10] 2018-06-04 06:31:13,859 > NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), > cannot allocate chunk of 1.000MiB > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-06-04 06:32:01,385 > NoSpamLogger.java:94 - Out of 14 commit log syncs over the past > 134.02s with average duration of 34.90ms, 2 have exceeded the > configured commit interval by an average of 60.66ms > ... > INFO [ValidationExecutor:10] 2018-06-04 13:31:19,011 > NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), > cannot allocate chunk of 1.000MiB > INFO [AntiEntropyStage:1] 2018-06-04 13:37:17,357 > RepairSession.java:180 - [repair > #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event > from /14.0.52.115 > INFO [ValidationExecutor:10] 2018-06-04 13:46:19,281 > NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), > cannot allocate chunk of 1.000MiB > INFO [IndexSummaryManager:1] 2018-06-04 13:57:18,772 > IndexSummaryRedistribution.java:76 - Redistributing index summaries > INFO [AntiEntropyStage:1] 2018-06-04 13:58:21,971 > RepairSession.java:180 - [repair > #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event > from /14.0.53.234 > INFO [RepairJobTask:4] 2018-06-04 13:58:39,780 SyncTask.java:73 - > [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Endpoints /14.0.52.115 > and /14.0.53.234 have 15406 range(s) out of sync for event > INFO [RepairJobTask:4] 2018-06-04 13:58:39,781 LocalSyncTask.java:71 > - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Performing streaming > repair of 15406 ranges with /14.0.52.115 > INFO [RepairJobTask:4] 2018-06-04 13:59:49,075 > StreamResultFuture.java:90 - [Stream > #6244fd50-67ff-11e8-b07c-c365701888e8] Executing streaming plan for > Repair > INFO [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,076 > StreamSession.java:266 - [Stream > #6244fd50-67ff-11e8-b07c-c365701888e8] Starting streaming to > /14.0.52.115 > INFO [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,089 > StreamCoordinator.java:264 - [Stream > #6244fd50-67ff-11e8-b07c-c365701888e8, ID#0] Beginning stream session > with /14.0.52.115 > INFO [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:01:14,423 > StreamResultFuture.java:173 - [Stream > #6244fd50-67ff-11e8-b07c-c365701888e8 ID#0] Prepare completed. > Receiving 321 files(6.238GiB), sending 318 files(6.209GiB) > WARN [Service Thread] 2018-06-04 14:12:15,578 GCInspector.java:282 - > ConcurrentMarkSweep GC in 4095ms. CMS O
Repair slow, "Percent repaired" never updated
Hi, we're on cassandra 3.11.2, and we're having some issues with repairs. They take ages to complete, and some time ago the incremental repair stopped working - that is, SSTables are not being marked as repaired, even though the repair reports success. Running a full or incremental repair does not make any difference. Here's a log of a typical repair (omitted a lot of 'Maximum memory usage' messages): INFO [Repair-Task-12] 2018-06-04 06:29:50,396 RepairRunnable.java:139 - Starting repair command #11 (af1aefc0-67c0-11e8-b07c-c365701888e8), repairing keyspace prod with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [event], dataCenters: [DC1], hosts: [], # of ranges: 1280, pull repair: false) INFO [Repair-Task-12] 2018-06-04 06:29:51,497 RepairSession.java:228 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] new session: will sync /14.0.53.234, /14.0.52.115 on range [...] for asm_log.[event] INFO [Repair#11:1] 2018-06-04 06:29:51,776 RepairJob.java:169 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Requesting merkle trees for event (to [/14.0.52.115, /14.0.53.234]) INFO [ValidationExecutor:10] 2018-06-04 06:31:13,859 NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-06-04 06:32:01,385 NoSpamLogger.java:94 - Out of 14 commit log syncs over the past 134.02s with average duration of 34.90ms, 2 have exceeded the configured commit interval by an average of 60.66ms ... INFO [ValidationExecutor:10] 2018-06-04 13:31:19,011 NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB INFO [AntiEntropyStage:1] 2018-06-04 13:37:17,357 RepairSession.java:180 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event from /14.0.52.115 INFO [ValidationExecutor:10] 2018-06-04 13:46:19,281 NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB INFO [IndexSummaryManager:1] 2018-06-04 13:57:18,772 IndexSummaryRedistribution.java:76 - Redistributing index summaries INFO [AntiEntropyStage:1] 2018-06-04 13:58:21,971 RepairSession.java:180 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Received merkle tree for event from /14.0.53.234 INFO [RepairJobTask:4] 2018-06-04 13:58:39,780 SyncTask.java:73 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Endpoints /14.0.52.115 and /14.0.53.234 have 15406 range(s) out of sync for event INFO [RepairJobTask:4] 2018-06-04 13:58:39,781 LocalSyncTask.java:71 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Performing streaming repair of 15406 ranges with /14.0.52.115 INFO [RepairJobTask:4] 2018-06-04 13:59:49,075 StreamResultFuture.java:90 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] Executing streaming plan for Repair INFO [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,076 StreamSession.java:266 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] Starting streaming to /14.0.52.115 INFO [StreamConnectionEstablisher:3] 2018-06-04 13:59:49,089 StreamCoordinator.java:264 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8, ID#0] Beginning stream session with /14.0.52.115 INFO [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:01:14,423 StreamResultFuture.java:173 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8 ID#0] Prepare completed. Receiving 321 files(6.238GiB), sending 318 files(6.209GiB) WARN [Service Thread] 2018-06-04 14:12:15,578 GCInspector.java:282 - ConcurrentMarkSweep GC in 4095ms. CMS Old Gen: 4086661264 -> 1107272664; Par Eden Space: 503316480 -> 0; Par Survivor Space: 21541464 -> 0 ... WARN [GossipTasks:1] 2018-06-04 14:12:15,677 FailureDetector.java:288 - Not marking nodes down due to local pause of 5123793157 > 50 INFO [ScheduledTasks:1] 2018-06-04 14:12:20,611 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log) INFO [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,188 StreamResultFuture.java:187 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] Session with /14.0.52.115 is complete INFO [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190 StreamResultFuture.java:219 - [Stream #6244fd50-67ff-11e8-b07c-c365701888e8] All sessions completed INFO [STREAM-IN-/14.0.52.115:7000] 2018-06-04 14:14:29,190 LocalSyncTask.java:121 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Sync complete using session afc2ef90-67c0-11e8-b07c-c365701888e8 between /14.0.52.115 and /14.0.53.234 on event INFO [RepairJobTask:5] 2018-06-04 14:14:29,191 RepairJob.java:143 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] event is fully synced INFO [RepairJobTask:5] 2018-06-04 14:14:29,193 RepairSession.java:270 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Session completed successfully Tablestats: SSTable count: 714 Space used (live): 489416489322 Space used (total): 489416489322 Space used by snapshots (total): 0
Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2
Nevermind, we resolved the issue JVM heap settings were misconfigured Martin On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m.mac...@gmail.com> wrote: > Hi all, > > We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded > it to 3.11.2. > > Each node has 32 GB RAM, 8 GB Cassandra heap size. > > After the upgrade, clients started reporting connection issues: > > cassandra | [ERROR] Closing established connection pool to host > because of the following error: Read error 'connection > reset by peer' (src/pool.cpp:384) > cassandra | [ERROR] Unable to establish a control connection to host > because of the following error: Error: 'Request timed out' > (0x010E) (src/control_connection.cpp:263) > > > Cassandra logs are full of garbage collection warnings: > > WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 - > ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 -> > 1774446288; Par Survivor Space: 858980344 -> 0 > INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 - > Pool NameActive Pending Completed Blocked > All Time Blocked > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > MutationStage10 92526002 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > ReadStage 2 2 943544 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > RequestResponseStage 0 01666876 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > ReadRepairStage 0 0 10362 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > CounterMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > MiscStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > CompactionExecutor0 0 3076 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > MemtableReclaimMemory 0 0 44 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > PendingRangeCalculator0 0 4 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > GossipStage 0 0 14287 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > SecondaryIndexManagement 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > HintsDispatcher 0 0 1 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_1 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_2 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - > MigrationStage0 0 2 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > MemtablePostFlush 0 0 72 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_0 0 0 44 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > ValidationExecutor0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > Sampler 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - > MemtableFlushWriter 0 0 44 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_5 0
Nodes unresponsive after upgrade 3.9 -> 3.11.2
Hi all, We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded it to 3.11.2. Each node has 32 GB RAM, 8 GB Cassandra heap size. After the upgrade, clients started reporting connection issues: cassandra | [ERROR] Closing established connection pool to host because of the following error: Read error 'connection reset by peer' (src/pool.cpp:384) cassandra | [ERROR] Unable to establish a control connection to host because of the following error: Error: 'Request timed out' (0x010E) (src/control_connection.cpp:263) Cassandra logs are full of garbage collection warnings: WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 - ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 -> 1774446288; Par Survivor Space: 858980344 -> 0 INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 - Pool NameActive Pending Completed Blocked All Time Blocked INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - MutationStage10 92526002 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - ReadStage 2 2 943544 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - RequestResponseStage 0 01666876 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - ReadRepairStage 0 0 10362 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - MiscStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - CompactionExecutor0 0 3076 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - MemtableReclaimMemory 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - PendingRangeCalculator0 0 4 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - GossipStage 0 0 14287 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - HintsDispatcher 0 0 1 0 0 INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 - PerDiskMemtableFlushWriter_1 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - PerDiskMemtableFlushWriter_2 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - MigrationStage0 0 2 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - MemtablePostFlush 0 0 72 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - ValidationExecutor0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - Sampler 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - MemtableFlushWriter 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - PerDiskMemtableFlushWriter_5 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - InternalResponseStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - PerDiskMemtableFlushWriter_3 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - PerDiskMemtableFlushWriter_4 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - AntiEntropyStage 0
Re: Rebuild to a new DC fails every time
Thanks for the tips, Alan. The cluster is entirely healthy. But the connection between DCs is a VPN, managed by a third party - it is possible it might be flaky. However, I would expect the rebuild job to be able to recover from connection timeout/reset type of errors without a need for manual intervention. In the end we opted for restore from snapshot + repair, to bring up the node in the new DC. We'll see how that goes. Regards, Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Rebuild to a new DC fails every time
None of the files is listed more than once in the logs: java.lang.RuntimeException: Transfer of file /fs3/cassandra/data//event_group-3b5782d08e4411e6842917253f111990/mc-116042-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs0/cassandra/data//event_group-3b5782d08e4411e6842917253f111990/mc-111370-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs3/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs4/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-9133-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs2/cassandra/data//event_alert-13d78e3f11e6a6cbe1698349da4d/mc-3997-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs1/cassandra/data///event_group-3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db already completed or aborted (perhaps session failed?). On Mon, Jan 8, 2018 at 2:21 AM, kurt greaves <k...@instaclustr.com> wrote: > If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms is > 48 hours. Appears rebuild is trying to stream the same file twice. Are there > other exceptions in the logs related to the file, or can you find out if > it's previously been sent by the same session? Search the logs for the file > that failed and post back any exceptions. > > On 29 December 2017 at 10:18, Martin Mačura <m.mac...@gmail.com> wrote: >> >> Is this something that can be resolved by CASSANDRA-11841 ? >> >> Thanks, >> >> Martin >> >> On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.mac...@gmail.com> wrote: >> > Hi all, >> > we are trying to add a new datacenter to the existing cluster, but the >> > 'nodetool rebuild' command always fails after a couple of hours. >> > >> > We're on Cassandra 3.9. >> > >> > Example 1: >> > >> > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:55735] 2017-12-13 >> > 23:55:38,840 StreamResultFuture.java:174 - [Stream >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. >> > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB) >> > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-13 >> > 23:55:38,858 StreamResultFuture.java:174 - [Stream >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. >> > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB) >> > >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14 >> > 04:28:09,064 StreamSession.java:533 - [Stream >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on >> > session with peer 172.25.16.125 >> > 172.24.16.169 java.io.IOException: Connection reset by peer >> > >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14 >> > 07:26:26,832 StreamSession.java:533 - [Stream >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on >> > session with peer 172.25.16.125 >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file >> > -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db >> > already completed or aborted (perhaps session failed?). >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14 >> > 07:26:50,004 StreamSession.java:533 - [Stream >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on >> > session with peer 172.24.16.169 >> > 172.25.16.125 java.io.IOException: Connection reset by peer >> > >> > Example 2: >> > >> > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:35202] 2017-12-18 >> > 03:24:31,423 StreamResultFuture.java:174 - [Stream >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. >> > Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB) >> > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-18 >> > 03:24:31,441 StreamResultFuture.java:174 - [Stream >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. >> > Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB) >> > >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18 >> > 06:39:42,049 StreamSession.java:533 - [Stream >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on >> > session with peer 172.25.16.125 >> > 172.24.16.169 java.io.IOException: Connection
Re: Rebuild to a new DC fails every time
Is this something that can be resolved by CASSANDRA-11841 ? Thanks, Martin On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.mac...@gmail.com> wrote: > Hi all, > we are trying to add a new datacenter to the existing cluster, but the > 'nodetool rebuild' command always fails after a couple of hours. > > We're on Cassandra 3.9. > > Example 1: > > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:55735] 2017-12-13 > 23:55:38,840 StreamResultFuture.java:174 - [Stream > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB) > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-13 > 23:55:38,858 StreamResultFuture.java:174 - [Stream > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB) > > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14 > 04:28:09,064 StreamSession.java:533 - [Stream > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > session with peer 172.25.16.125 > 172.24.16.169 java.io.IOException: Connection reset by peer > > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14 > 07:26:26,832 StreamSession.java:533 - [Stream > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > session with peer 172.25.16.125 > 172.24.16.169 java.lang.RuntimeException: Transfer of file > -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db > already completed or aborted (perhaps session failed?). > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14 > 07:26:50,004 StreamSession.java:533 - [Stream > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on > session with peer 172.24.16.169 > 172.25.16.125 java.io.IOException: Connection reset by peer > > Example 2: > > 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:35202] 2017-12-18 > 03:24:31,423 StreamResultFuture.java:174 - [Stream > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. > Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB) > 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-18 > 03:24:31,441 StreamResultFuture.java:174 - [Stream > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. > Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB) > > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18 > 06:39:42,049 StreamSession.java:533 - [Stream > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > session with peer 172.25.16.125 > 172.24.16.169 java.io.IOException: Connection reset by peer > > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18 > 09:25:36,188 StreamSession.java:533 - [Stream > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > session with peer 172.25.16.125 > 172.24.16.169 java.lang.RuntimeException: Transfer of file > -3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db > already completed or aborted (perhaps session failed?). > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18 > 09:25:59,447 StreamSession.java:533 - [Stream > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on > session with peer 172.24.16.169 > 172.25.16.125 java.io.IOException: Connection timed out > > Datacenter: PRIMARY > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens Owns (effective) Host ID > Rack > UN 172.24.16.169 918.31 GiB 256 100.0% > bc4a980b-cca6-4ca2-b32f-f8206d48e14c RAC1 > UN 172.24.16.170 908.76 GiB 256 100.0% > 37b2742e-c83a-4341-896f-09d244810e69 RAC1 > UN 172.24.16.171 908.44 GiB 256 100.0% > 6dc2b9d8-75dd-48f8-858c-53b1af42e8fb RAC1 > Datacenter: SECONDARY > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens Owns (effective) Host ID > Rack > UN 172.25.16.125 27.48 GiB 256 100.0% > 1e1669eb-cfd2-4718-a073-558946a8c947 RAC2 > UN 172.25.16.124 28.24 GiB 256 100.0% > 896d9894-10c8-4269-9476-5ddab3c8abe9 RAC2 > > Any ideas? > > Thanks, > > Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Rebuild to a new DC fails every time
Hi all, we are trying to add a new datacenter to the existing cluster, but the 'nodetool rebuild' command always fails after a couple of hours. We're on Cassandra 3.9. Example 1: 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:55735] 2017-12-13 23:55:38,840 StreamResultFuture.java:174 - [Stream #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB) 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-13 23:55:38,858 StreamResultFuture.java:174 - [Stream #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed. Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB) 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14 04:28:09,064 StreamSession.java:533 - [Stream #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on session with peer 172.25.16.125 172.24.16.169 java.io.IOException: Connection reset by peer 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14 07:26:26,832 StreamSession.java:533 - [Stream #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on session with peer 172.25.16.125 172.24.16.169 java.lang.RuntimeException: Transfer of file -13d78e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db already completed or aborted (perhaps session failed?). 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14 07:26:50,004 StreamSession.java:533 - [Stream #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on session with peer 172.24.16.169 172.25.16.125 java.io.IOException: Connection reset by peer Example 2: 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:35202] 2017-12-18 03:24:31,423 StreamResultFuture.java:174 - [Stream #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB) 172.25.16.125 INFO [STREAM-IN-/172.24.16.169:7000] 2017-12-18 03:24:31,441 StreamResultFuture.java:174 - [Stream #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed. Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB) 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18 06:39:42,049 StreamSession.java:533 - [Stream #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on session with peer 172.25.16.125 172.24.16.169 java.io.IOException: Connection reset by peer 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18 09:25:36,188 StreamSession.java:533 - [Stream #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on session with peer 172.25.16.125 172.24.16.169 java.lang.RuntimeException: Transfer of file -3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db already completed or aborted (perhaps session failed?). 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18 09:25:59,447 StreamSession.java:533 - [Stream #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on session with peer 172.24.16.169 172.25.16.125 java.io.IOException: Connection timed out Datacenter: PRIMARY === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.24.16.169 918.31 GiB 256 100.0% bc4a980b-cca6-4ca2-b32f-f8206d48e14c RAC1 UN 172.24.16.170 908.76 GiB 256 100.0% 37b2742e-c83a-4341-896f-09d244810e69 RAC1 UN 172.24.16.171 908.44 GiB 256 100.0% 6dc2b9d8-75dd-48f8-858c-53b1af42e8fb RAC1 Datacenter: SECONDARY = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.25.16.125 27.48 GiB 256 100.0% 1e1669eb-cfd2-4718-a073-558946a8c947 RAC2 UN 172.25.16.124 28.24 GiB 256 100.0% 896d9894-10c8-4269-9476-5ddab3c8abe9 RAC2 Any ideas? Thanks, Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org