[
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ralf Steppacher updated CASSANDRA-11105:
----------------------------------------
Description:
I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool
to work for my test scenario. I have followed the example on
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
to create a yaml file describing my test (attached).
I am collecting events per user id (text, partition key). Events have a session
type (text), event type (text), and creation time (timestamp) (clustering keys,
in that order). Plus some more attributes required for rendering the events in
a UI. For testing purposes I ended up with the following column spec and insert
distribution:
{noformat}
columnspec:
- name: created_at
cluster: uniform(10..10000)
- name: event_type
size: uniform(5..10)
population: uniform(1..30)
cluster: uniform(1..30)
- name: session_type
size: fixed(5)
population: uniform(1..4)
cluster: uniform(1..4)
- name: user_id
size: fixed(15)
population: uniform(1..1000000)
- name: message
size: uniform(10..100)
population: uniform(1..100B)
insert:
partitions: fixed(1)
batchtype: UNLOGGED
select: fixed(1)/1200000
{noformat}
Running stress tool for just the insert prints
{noformat}
Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
total rows in the partitions)
{noformat}
and then immediately starts flooding me with
{{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large}}.
Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the
{{cassandra.yaml}} I do not understand. My understanding is that the stress
tool should generate one row per batch. The size of a single row should not
exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all text
characters being 3 byte unicode characters.
This is how I start the attached user scenario:
{noformat}
[rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml
ops\(insert=1\) -log level=verbose
file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node
10.211.55.8
INFO 08:00:07 Did not find Netty's native epoll transport in the classpath,
defaulting to NIO.
INFO 08:00:08 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy
(if this is incorrect, please provide the correct datacenter name with
DCAwareRoundRobinPolicy constructor)
INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added
Connected to cluster: Titan_DEV
Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
Created schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
total rows in the partitions)
com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
at
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
at
org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
at
org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
at
org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
at
org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too
large
at
com.datastax.driver.core.Responses$Error.asException(Responses.java:125)
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120)
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at
com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45)
at
com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752)
at
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925)
at
com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at
com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
...
{noformat}
The C* log:
{noformat}
INFO 08:00:04 Listening for thrift clients...
WARN 08:00:07 Detected connection using native protocol version 2. Both
version 1 and 2 of the native protocol are now deprecated and support will be
removed in Cassandra 3.0. You are encouraged to upgrade to a client driver
using version 3 of the native protocol
ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large] is
of size 58024, exceeding specified threshold of 51200 by 6824. (see
batch_size_fail_threshold_in_kb)
ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large] is
of size 77985, exceeding specified threshold of 51200 by 26785. (see
batch_size_fail_threshold_in_kb)
...
{noformat}
was:
I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool
to work for my test scenario. I have followed the example on
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
to create a yaml file describing my test.
I am collecting events per user id (text, partition key). Events have a session
type (text), event type (text), and creation time (timestamp) (clustering keys,
in that order). Plus some more attributes required for rendering the events in
a UI. For testing purposes I ended up with the following column spec and insert
distribution:
{noformat}
columnspec:
- name: created_at
cluster: uniform(10..10000)
- name: event_type
size: uniform(5..10)
population: uniform(1..30)
cluster: uniform(1..30)
- name: session_type
size: fixed(5)
population: uniform(1..4)
cluster: uniform(1..4)
- name: user_id
size: fixed(15)
population: uniform(1..1000000)
- name: message
size: uniform(10..100)
population: uniform(1..100B)
insert:
partitions: fixed(1)
batchtype: UNLOGGED
select: fixed(1)/1200000
{noformat}
Running stress tool for just the insert prints
{noformat}
Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
total rows in the partitions)
{noformat}
and then immediately starts flooding me with
{{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large}}.
Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the
{{cassandra.yaml}} I do not understand. My understanding is that the stress
tool should generate one row per batch. The size of a single row should not
exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all text
characters being 3 byte unicode characters.
This is how I start the attached user scenario:
{noformat}
[rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml
ops\(insert=1\) -log level=verbose
file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node
10.211.55.8
INFO 08:00:07 Did not find Netty's native epoll transport in the classpath,
defaulting to NIO.
INFO 08:00:08 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy
(if this is incorrect, please provide the correct datacenter name with
DCAwareRoundRobinPolicy constructor)
INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added
Connected to cluster: Titan_DEV
Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
Created schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
total rows in the partitions)
com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
at
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
at
org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
at
org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
at
org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
at
org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too
large
at
com.datastax.driver.core.Responses$Error.asException(Responses.java:125)
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120)
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at
com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45)
at
com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752)
at
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925)
at
com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at
com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at
com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at
com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at
com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
...
{noformat}
The C* log:
{noformat}
INFO 08:00:04 Listening for thrift clients...
WARN 08:00:07 Detected connection using native protocol version 2. Both
version 1 and 2 of the native protocol are now deprecated and support will be
removed in Cassandra 3.0. You are encouraged to upgrade to a client driver
using version 3 of the native protocol
ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large] is
of size 58024, exceeding specified threshold of 51200 by 6824. (see
batch_size_fail_threshold_in_kb)
ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large] is
of size 77985, exceeding specified threshold of 51200 by 26785. (see
batch_size_fail_threshold_in_kb)
...
{noformat}
> cassandra-stress tool - InvalidQueryException: Batch too large
> --------------------------------------------------------------
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
> Reporter: Ralf Steppacher
> Attachments: batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress
> tool to work for my test scenario. I have followed the example on
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
> to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a
> session type (text), event type (text), and creation time (timestamp)
> (clustering keys, in that order). Plus some more attributes required for
> rendering the events in a UI. For testing purposes I ended up with the
> following column spec and insert distribution:
> {noformat}
> columnspec:
> - name: created_at
> cluster: uniform(10..10000)
> - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
> - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
> - name: user_id
> size: fixed(15)
> population: uniform(1..1000000)
> - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
> partitions: fixed(1)
> batchtype: UNLOGGED
> select: fixed(1)/1200000
> {noformat}
> Running stress tool for just the insert prints
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too
> large}}.
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the
> {{cassandra.yaml}} I do not understand. My understanding is that the stress
> tool should generate one row per batch. The size of a single row should not
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all
> text characters being 3 byte unicode characters.
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node
> 10.211.55.8
> INFO 08:00:07 Did not find Netty's native epoll transport in the classpath,
> defaulting to NIO.
> INFO 08:00:08 Using data-center name 'datacenter1' for
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000]
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
> at
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
> at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
> at
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
> at
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
> at
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
> at
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
> at
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
> at
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch
> too large
> at
> com.datastax.driver.core.Responses$Error.asException(Responses.java:125)
> at
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120)
> at
> com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
> at
> com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45)
> at
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:752)
> at
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1003)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:925)
> at
> com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at
> com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at
> com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at
> com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> ...
> {noformat}
> The C* log:
> {noformat}
> INFO 08:00:04 Listening for thrift clients...
> WARN 08:00:07 Detected connection using native protocol version 2. Both
> version 1 and 2 of the native protocol are now deprecated and support will be
> removed in Cassandra 3.0. You are encouraged to upgrade to a client driver
> using version 3 of the native protocol
> ERROR 08:00:14 Batch of prepared statements for [stresscql.batch_too_large]
> is of size 58024, exceeding specified threshold of 51200 by 6824. (see
> batch_size_fail_threshold_in_kb)
> ERROR 08:00:15 Batch of prepared statements for [stresscql.batch_too_large]
> is of size 77985, exceeding specified threshold of 51200 by 26785. (see
> batch_size_fail_threshold_in_kb)
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)