Re: Cassandra on SLES 15?

2023-03-09 Thread Elliott Sims via user
A quick search shows SLES 15 provides Java 11 (java-11-openjdk), which is
just fine for Cassandra 4.x.

On Wed, Mar 8, 2023 at 2:56 PM Eric Ferrenbach <
eric.ferrenb...@milliporesigma.com> wrote:

> We are running Cassandra 4.0.7.
>
> We are preparing to migrate our nodes from Centos to SUSE Linux.
>
>
>
> This page only mentions SLES 12 (not 15)
>
>
> https://cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html
>
>
>
> This states SLES 12 Active support ends next year:
>
> https://endoflife.date/sles
>
>
>
> Does anyone have any information on running Cassandra 4 on SLES 15?
>
> Is this being tested anywhere?
>
>
>
> Thank you in advance,
>
> Eric
>
>
>
>
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click emdgroup.com/disclaimer
>  to
> access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak
> versions of this disclaimer.
>
>
>
> Please find our Privacy Statement information by clicking here: 
> emdgroup.com/privacy-statement
> (U.S.)  or 
> emdserono.com/privacy-statement
> (Canada) 
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.


TOO_MANY_KEY_UPDATES error with TLS

2023-04-12 Thread Elliott Sims via user
A few weeks ago, we rolled out TLS among hosts in our clusters (running
4.0.7).  More recently we also rolled out TLS between Cassandra clients and
the cluster.  Today, we started seeing a lot of dropped actions in one
cluster that correlate with warnings like this:

WARN  [epollEventLoopGroup-5-31] 2023-04-12 15:43:34,476
PreV5Handlers.java:261 - Unknown exception in client networking

io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException:
error:1104:SSL routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES

at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)

at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)

at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)

at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)

at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)

at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)

at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)

at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)

at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)

at
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)

at
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)

at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)

at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: javax.net.ssl.SSLException: error:1104:SSL
routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES

at
io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1028)

at
io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1321)

at
io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1270)

at
io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1346)

at
io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1389)

at
io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:206)

at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1387)

at
io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1294)

at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1331)

at
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)

at
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)

... 15 common frames omitted

INFO  [ScheduledTasks:1] 2023-04-12 15:46:19,701 MessagingMetrics.java:206
- READ_RSP messages were dropped in last 5000 ms: 0 internal and 3 cross
node. Mean internal dropped latency: 0 ms and Mean cross-node dropped
latency: 5960 ms

This looks similar to a bug in OpenSSL fixed in 2019:
https://github.com/openssl/openssl/pull/8299
but the equivalent change doesn't seem to have been ported over to
BoringSSL.  Has anyone else run across this, or have some sort of
workaround?

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.


Re: TOO_MANY_KEY_UPDATES error with TLS

2023-04-12 Thread Elliott Sims via user
Update to this:  per https://github.com/openssl/openssl/issues/8068 it
looks like BoringSSL should avoid this issue, so it may be related to
client behavior of some sort.  It's unclear to me from the message whether
it's intra-cluster traffic or client/cluster traffic generating the error.

On Wed, Apr 12, 2023 at 11:36 AM Elliott Sims  wrote:

> A few weeks ago, we rolled out TLS among hosts in our clusters (running
> 4.0.7).  More recently we also rolled out TLS between Cassandra clients and
> the cluster.  Today, we started seeing a lot of dropped actions in one
> cluster that correlate with warnings like this:
>
> WARN  [epollEventLoopGroup-5-31] 2023-04-12 15:43:34,476
> PreV5Handlers.java:261 - Unknown exception in client networking
>
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException:
> error:1104:SSL routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES
>
> at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478)
>
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>
> at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>
> at
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>
> at java.base/java.lang.Thread.run(Thread.java:829)
>
> Caused by: javax.net.ssl.SSLException: error:1104:SSL
> routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES
>
> at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1028)
>
> at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1321)
>
> at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1270)
>
> at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1346)
>
> at
> io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1389)
>
> at
> io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:206)
>
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1387)
>
> at
> io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1294)
>
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1331)
>
> at
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
>
> at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
>
> ... 15 common frames omitted
>
> INFO  [ScheduledTasks:1] 2023-04-12 15:46:19,701
> MessagingMetrics.java:206 - READ_RSP messages were dropped in last 5000 ms:
> 0 internal and 3 cross node. Mean internal dropped latency: 0 ms and Mean
> cross-node dropped latency: 5960 ms
>
> This looks similar to a bug in OpenSSL fixed in 2019:
> https://github.com/openssl/openssl/pull/8299
> but the equivalent change doesn't seem to have been ported over to
> BoringSSL.  Has anyone else run across this, or have some sort of
> workaround?
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.


Re: Cassandra p95 latencies

2023-08-14 Thread Elliott Sims via user
1.  Check for Nagle/delayed-ack, but probably nodelay is getting set by the
driver so it shouldn't be a problem.
2.  Check for network latency (just regular old ping among hosts, during
traffic)
3.  Check your GC metrics and see if garbage collections line up with
outliers.  Some tuning can help there, depending on the pattern, but 40ms
p99 at least would be fairly normal for G1GC.
4.  Check actual local write times, and I/O times with iostat.  If you have
spinning drives 40ms is fairly expected.  It's high but not totally
unexpected for consumer-grade SSDs.  For enterprise-grade SSDs commit times
that long would be very unusual.  What are your commitlog_sync settings?

On Mon, Aug 14, 2023 at 8:43 AM Josh McKenzie  wrote:

> The queries are rightly designed
>
> Data modeling in Cassandra is 100% gray space; there unfortunately is no
> right or wrong design. You'll need to share basic shapes / contours of your
> data model for other folks to help you; seemingly innocuous things in a
> data model can cause unexpected issues w/C*'s storage engine paradigm
> thanks to the partitioning and data storage happening under the hood.
>
> If you were seeing single digit ms on 3.0.X or 3.11.X and 40ms p95 on 4.0
> I'd immediately look to the DB as being the culprit. For all other cases,
> you should be seeing single digit ms as queries in C* generally boil down
> to key/value lookups (partition key) to a list of rows you either point
> query (key/value #2) or range scan via clustering keys and pull back out.
>
> There's also paging to take into consideration (whether you're using it or
> not, what your page size is) and the data itself (do you have thousands of
> columns? Multi-MB blobs you're pulling back out? etc). All can play into
> this.
>
> On Fri, Aug 11, 2023, at 3:40 PM, Jeff Jirsa wrote:
>
> You’re going to have to help us help you
>
> 4.0 is pretty widely deployed. I’m not aware of a perf regression
>
> Can you give us a schema (anonymized) and queries and show us a trace ?
>
>
> On Aug 10, 2023, at 10:18 PM, Shaurya Gupta 
> wrote:
>
> 
> The queries are rightly designed as I already explained. 40 ms is way too
> high as compared to what I seen with other DBs and many a times with
> Cassandra 3.x versions.
> CPU consumed as I mentioned is not high, it is around 20%.
>
> On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:
>
> Hi,
> P95 should not be a problem if rightly designed. Levelled compaction
> strategy further reduces this, however it consume some resources. For read,
> caching is also helpful.
> Can you check your cpu iowait as it could be the reason for delay
>
> Regards,
> Ashish
>
> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:
>
> Hi community
>
> What is the expected P95 latency for Cassandra Read and Write queries
> executed with Local_Quorum over a table with 3 replicas ? The queries are
> done using the partition + clustering key and row size in bytes is not too
> much, maybe 1-2 KB maximum.
> Assuming CPU is not a crunch ?
>
> We observe those to be 40 ms P95 Reads and same for Writes. This looks
> very high as compared to what we expected. We are using Cassandra 4.0.
>
> Any documentation / numbers will be helpful.
>
> Thanks
> --
> Shaurya Gupta
>
>
>
> --
> Shaurya Gupta
>
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.