[
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395138#comment-14395138
]
Roman Tkachenko commented on CASSANDRA-9045:
--------------------------------------------
These are the log lines I found that mention this SSTable:
{code}
INFO [CompactionExecutor:96] 2015-04-03 00:12:51,256 CompactionTask.java (line
296) Compacted 38 sstables to
[/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44691,/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44797,/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44838,/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-44917,/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45038,/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45076,].
2,024,901,266 bytes to 1,649,830,502 (~81% of original) in 262,455ms =
5.994936MB/s. 11,277 total partitions merged to 10,647. Partition merge
counts were {1:10108, 2:557, 3:6, 7:2, 10:1, 13:1, }
INFO [CompactionExecutor:153] 2015-04-03 00:26:51,990 CompactionTask.java
(line 120) Compacting
[SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45038-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45172-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45165-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45181-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45152-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44797-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44838-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-44917-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45164-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-44691-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45169-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45171-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45163-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45161-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45159-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45180-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45156-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45179-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45176-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45160-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45167-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45173-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45170-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45174-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45076-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45157-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45158-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45168-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45162-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45175-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45166-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45177-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45155-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45178-Data.db'),
SSTableReader(path='/var/mailgun/sstables1/blackbook/bounces/blackbook-bounces-jb-45182-Data.db'),
SSTableReader(path='/var/mailgun/sstables2/blackbook/bounces/blackbook-bounces-jb-45154-Data.db'),
SSTableReader(path='/var/mailgun/sstables3/blackbook/bounces/blackbook-bounces-jb-45153-Data.db')]
INFO [ValidationExecutor:5] 2015-04-03 00:30:42,861 SSTableReader.java (line
223) Opening
/var/mailgun/sstables1/blackbook/bounces/snapshots/a3086410-d998-11e4-a470-75a34b607670/blackbook-bounces-jb-44797
(183391464 bytes)
{code}
> Deleted columns are resurrected after repair in wide rows
> ---------------------------------------------------------
>
> Key: CASSANDRA-9045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Roman Tkachenko
> Assignee: Marcus Eriksson
> Priority: Critical
> Fix For: 2.0.15
>
> Attachments: 9045-debug-tracing.txt,
> apache-cassandra-2.0.13-SNAPSHOT.jar, cqlsh.txt, debug.txt, inconsistency.txt
>
>
> Hey guys,
> After almost a week of researching the issue and trying out multiple things
> with (almost) no luck I was suggested (on the user@cass list) to file a
> report here.
> h5. Setup
> Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if
> it goes away)
> Multi datacenter 12+6 nodes cluster.
> h5. Schema
> {code}
> cqlsh> describe keyspace blackbook;
> CREATE KEYSPACE blackbook WITH replication = {
> 'class': 'NetworkTopologyStrategy',
> 'IAD': '3',
> 'ORD': '3'
> };
> USE blackbook;
> CREATE TABLE bounces (
> domainid text,
> address text,
> message text,
> "timestamp" bigint,
> PRIMARY KEY (domainid, address)
> ) WITH
> bloom_filter_fp_chance=0.100000 AND
> caching='KEYS_ONLY' AND
> comment='' AND
> dclocal_read_repair_chance=0.100000 AND
> gc_grace_seconds=864000 AND
> index_interval=128 AND
> read_repair_chance=0.000000 AND
> populate_io_cache_on_flush='false' AND
> default_time_to_live=0 AND
> speculative_retry='99.0PERCENTILE' AND
> memtable_flush_period_in_ms=0 AND
> compaction={'class': 'LeveledCompactionStrategy'} AND
> compression={'sstable_compression': 'LZ4Compressor'};
> {code}
> h5. Use case
> Each row (defined by a domainid) can have many many columns (bounce entries)
> so rows can get pretty wide. In practice, most of the rows are not that big
> but some of them contain hundreds of thousands and even millions of columns.
> Columns are not TTL'ed but can be deleted using the following CQL3 statement:
> {code}
> delete from bounces where domainid = 'domain.com' and address =
> '[email protected]';
> {code}
> All queries are performed using LOCAL_QUORUM CL.
> h5. Problem
> We weren't very diligent about running repairs on the cluster initially, but
> shorty after we started doing it we noticed that some of previously deleted
> columns (bounce entries) are there again, as if tombstones have disappeared.
> I have run this test multiple times via cqlsh, on the row of the customer who
> originally reported the issue:
> * delete an entry
> * verify it's not returned even with CL=ALL
> * run repair on nodes that own this row's key
> * the columns reappear and are returned even with CL=ALL
> I tried the same test on another row with much less data and everything was
> correctly deleted and didn't reappear after repair.
> h5. Other steps I've taken so far
> Made sure NTP is running on all servers and clocks are synchronized.
> Increased gc_grace_seconds to 100 days, ran full repair (on the affected
> keyspace) on all nodes, then changed it back to the default 10 days again.
> Didn't help.
> Performed one more test. Updated one of the resurrected columns, then deleted
> it and ran repair again. This time the updated version of the column
> reappeared.
> Finally, I noticed these log entries for the row in question:
> {code}
> INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936
> CompactionController.java (line 192) Compacting large row
> blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
> {code}
> Figuring it may be related I bumped "in_memory_compaction_limit_in_mb" to
> 512MB so the row fits into it, deleted the entry and ran repair once again.
> The log entry for this row was gone and the columns didn't reappear.
> We have a lot of rows much larger than 512MB so can't increase this
> parameters forever, if that is the issue.
> Please let me know if you need more information on the case or if I can run
> more experiments.
> Thanks!
> Roman
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)