[
https://issues.apache.org/jira/browse/CASSANDRA-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Penick updated CASSANDRA-6534:
--------------------------------------
Description:
We noticed extremely slow insertion rates to a single partition key, using
composite column with a collection value. We were not able to replicate the
issue using the same schema, but with a non-colleciton value even with much
larger values. There are also tons of these messages in the system.log:
"GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2
collections, 1233256368 used; max is 8375238656"
We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after
only a couple 10k inserts. The amount of memory being used by C*/JVM is no
where near proportional to the amount data being inserted. Why is C* consuming
so much memory?
Attached is a picture of the GC under the one of the pathological tests. Keep
in mind we are only inserting 128KB - 256KB of data and we are almost hitting
the limit of the heap.
GC flags:
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8192M
-Xmx8192M
-Xmn2048M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
Example schemas:
Note: The type of collection or primitive type in the collection doesn't seem
to matter.
{code}
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value list<int>,
PRIMARY KEY(row_key, column_key));
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value map<text, text>,
PRIMARY KEY(row_key, column_key));
{code}
Example inserts:
Note: This issue is able to be replicated with extremely small inserts (a well
as larger ~1KB)
{code}
INSERT INTO test.test
(row_key, column_key, column_value)
VALUES
('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
INSERT INTO test.test
(row_key, column_key, column_value)
VALUES
('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a':
'0123456701234567012345670', 'b': '0123456701234567012345670' });
{code}
As a comparison I was able to run the same tests with the following schema with
no issue:
Note: This test was able to run a much faster insertion speed and much bigger
column sizes (1KB) without any GC issues.
{code}
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value text,
PRIMARY KEY(row_key, column_key) )
{code}
was:
We noticed extremely slow insertion rates to a single partition key, using
composite column with a collection value. We were not able to replicate the
issue using the same schema, but with a non-colleciton value even with much
larger values.
There are tons of these in the logs:
"GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2
collections, 1233256368 used; max is 8375238656"
We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after
only a couple 10k inserts. The amount of memory being used by C*/JVM is no
where near proportional to the amount data being inserted. Why is C* consuming
so much memory?
Attached are pictures of the GC under the different tests. Keep in mind we are
only inserting 128KB - 256KB of data and we are almost hitting the limit of the
heap.
GC flags:
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8192M
-Xmx8192M
-Xmn2048M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
Example schemas:
{code}
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value list<int>,
PRIMARY KEY(row_key, column_key));
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value map<text, text>,
PRIMARY KEY(row_key, column_key));
{code}
Example inserts:
Note: This issue is able to be replicated with extremely small inserts (a well
as larger ~1KB)
{code}
INSERT INTO test.test
(row_key, column_key, column_value)
VALUES
('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
INSERT INTO test.test
(row_key, column_key, column_value)
VALUES
('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a':
'0123456701234567012345670', 'b': '0123456701234567012345670' });
{code}
As a comparison I was able to run the same tests with the following schema with
no issue:
Note: This test was able to run a much faster insertion speed and much bigger
column sizes (1KB) without any GC issues.
{code}
CREATE TABLE test.test (
row_key text,
column_key uuid,
column_value text,
PRIMARY KEY(row_key, column_key) )
{code}
> Slow inserts with collections into a single partition (Pathological GC
> behavior)
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-6534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6534
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: dsc12-1.2.12-1.noarch.rpm
> cassandra12-1.2.12-1.noarch.rpm
> centos 6.4
> Reporter: Michael Penick
> Fix For: 1.2.12
>
> Attachments: GC_behavior.png
>
>
> We noticed extremely slow insertion rates to a single partition key, using
> composite column with a collection value. We were not able to replicate the
> issue using the same schema, but with a non-colleciton value even with much
> larger values. There are also tons of these messages in the system.log:
> "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2
> collections, 1233256368 used; max is 8375238656"
> We are inserting a tiny amounts of data 32-64 bytes and seeing the issue
> after only a couple 10k inserts. The amount of memory being used by C*/JVM is
> no where near proportional to the amount data being inserted. Why is C*
> consuming so much memory?
> Attached is a picture of the GC under the one of the pathological tests. Keep
> in mind we are only inserting 128KB - 256KB of data and we are almost hitting
> the limit of the heap.
> GC flags:
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -Xms8192M
> -Xmx8192M
> -Xmn2048M
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss180k
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB
> Example schemas:
> Note: The type of collection or primitive type in the collection doesn't seem
> to matter.
> {code}
> CREATE TABLE test.test (
> row_key text,
> column_key uuid,
> column_value list<int>,
> PRIMARY KEY(row_key, column_key));
> CREATE TABLE test.test (
> row_key text,
> column_key uuid,
> column_value map<text, text>,
> PRIMARY KEY(row_key, column_key));
> {code}
> Example inserts:
> Note: This issue is able to be replicated with extremely small inserts (a
> well as larger ~1KB)
> {code}
> INSERT INTO test.test
> (row_key, column_key, column_value)
> VALUES
> ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
> INSERT INTO test.test
> (row_key, column_key, column_value)
> VALUES
> ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a':
> '0123456701234567012345670', 'b': '0123456701234567012345670' });
> {code}
> As a comparison I was able to run the same tests with the following schema
> with no issue:
> Note: This test was able to run a much faster insertion speed and much bigger
> column sizes (1KB) without any GC issues.
> {code}
> CREATE TABLE test.test (
> row_key text,
> column_key uuid,
> column_value text,
> PRIMARY KEY(row_key, column_key) )
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)