[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size

Jonathan Ellis (JIRA) Mon, 24 Mar 2014 12:41:15 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945597#comment-13945597
 ]


Jonathan Ellis commented on CASSANDRA-6918:
-------------------------------------------

[~iamaleksey] is this something that counters++ will fix or do you think it is 
more general than counters?

> Compaction Assert: Incorrect Row Data Size
> ------------------------------------------
>
>                 Key: CASSANDRA-6918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6918
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 11 node Linux Cassandra 1.2.15 cluster, each node 
> configured as follows:
> 2P IntelXeon CPU X5660 @ 2.8 GHz (12 cores, 24 threads total)
> 148 GB RAM
> CentOS release 6.4 (Final)
> 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 
> x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
> Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
> Node configuration:
> Default cassandra.yaml settings for the most part with the following 
> exceptions:
> rpc_server_type: hsha
>            Reporter: Alexander Goodrich
>             Fix For: 1.2.16
>
>
> I have four tables in a schema with Replication Factor: 6 (previously we set 
> this to 3, but when we added more nodes we figured adding more replication to 
> improve read time would help, this might have aggravated the issue).
> create table table_value_one (
>     id timeuuid PRIMARY KEY,
>     value_1 counter
>     );
>     
> create table table_value_two (
>     id timeuuid PRIMARY KEY,
>     value_2 counter
>     );
> create table table_position_lookup (
>     value_1 bigint,
>     value_2 bigint,
>     id timeuuid,
>     PRIMARY KEY (id)
>     ) WITH compaction={'class': 'LeveledCompactionStrategy'};
> create table sorted_table (
>     row_key_index text,
>     range bigint,
>     sorted_value bigint,
>     id timeuuid,
>     extra_data list<bigint>,
>     PRIMARY KEY ((row_key_index, range), sorted_value, id)
>     ) WITH CLUSTERING ORDER BY (sorted_value DESC) AND
>       compaction={'class': 'LeveledCompactionStrategy'};
> The application creates an object, and stores it in sorted_table based on a 
> value position - for example, an object has a value_1 of 5500, and a value_2 
> of 4300.
> There are rows which represent indices by which I can sort items based on 
> these values in descending order. If I wish to see items with the highest # 
> of value_1, I can create an index that stores them like so:
> row_key_index = 'highest_value_1s'
> Additionally, we shard each row by bucket ranges - which is simply the 
> value_1 or value_2 / 1000. For example, our object above would be found in 
> row_key_index = 'highest_value_1s' and range 5000, and also in row_key_index 
> = 'highest_value_2s' with range 4300.
> The true values of this object are stored in two counter tables, 
> table_value_one and table_value_two. The current indexed position is stored 
> in table_position_lookup.
> We allow the application to modify value_one and value_two in the counter 
> table indiscriminately. If we know the current values for these are dirty, we 
> wait a tuned amount of time before we update the position in the sorted_table 
> index. This creates 2 delete operations, and 2 write operations on the same 
> table.
> The issue is when we expand the number of write/delete operations on 
> sorted_table, we see the following assert in the system log:
> ERROR [CompactionExecutor:169] 2014-03-24 08:07:12,871 CassandraDaemon.java 
> (line 191) Exception in thread Thread[CompactionExecutor:169,1,main]
> java.lang.AssertionError: incorrect row data size 77705872 written to 
> /var/lib/cassandra/data/loadtest_1/sorted_table/loadtest_1-sorted_table-tmp-ic-165-Data.db;
>  correct is 77800512
>         at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:208)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> Each object creates approximately ~500 unique row keys in sorted_table, and 
> it possesses an extra_data field containing approximately 15 different bigint 
> values.
> Previously, our application was running Cassandra 1.2.10 and we did not see 
> the assert when our sorted_table did not have the "extra data list<bigint>". 
> Also, we were writing around ~200 unique row keys, only containing the ID 
> column.
> We tried both leveled compaction and size tiered compaction and both cause 
> the same assert - compaction fails to happen, and after about 100k object 
> writes (creating 55 million rows, each having potentially as many as 100k 
> items in a single column), we have ~ 2.4 GB of SSTables spread across 4840 
> files, and 691 SSTables:
>               SSTable count: 691
>                 SSTables in each level: [685/4, 6, 0, 0, 0, 0, 0, 0, 0]
>                 Space used (live): 2244774352
>                 Space used (total): 2251159892
>                 SSTable Compression Ratio: 0.15101393198465862
>                 Number of Keys (estimate): 4704128
>                 Memtable Columns Count: 0
>                 Memtable Data Size: 0
>                 Memtable Switch Count: 264
>                 Read Count: 9204
>                 Read Latency: NaN ms.
>                 Write Count: 10151343
>                 Write Latency: NaN ms.
>                 Pending Tasks: 0
>                 Bloom Filter False Positives: 0
>                 Bloom Filter False Ratio: 0.00000
>                 Bloom Filter Space Used: 3500496
>                 Compacted row minimum size: 125
>                 Compacted row maximum size: 62479625
>                 Compacted row mean size: 1285302
>                 Average live cells per slice (last five minutes): 1001.0
>                 Average tombstones per slice (last five minutes): 8566.5
> Some mitigation strategies we have discussed include:
> * Breaking sorted_table into multiple column families to spread the # of 
> writes between.
> * Increasing the coalescing time delay
> * Removing extra_data and paying the cost of another table look up for each 
> item
> * Compressing extra_data into a blob
> * Reduce replication factor back down to 3 to reduce size pressure on SSTable.
> Running nodetool -pr repair does not fix the issue. Running nodetool compact 
> manually has not solved the issue as well. The asserts happen pretty 
> frequently across all nodes of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6918) Compaction Assert: Incorrect Row Data Size

Reply via email to