[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

Stefania (JIRA) Wed, 27 Apr 2016 23:46:52 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261622#comment-15261622
 ]


Stefania commented on CASSANDRA-9766:
-------------------------------------

There are lots of interesting ideas in this patch but also a lot to digest and 
so I will need to revisit it again. So far, these are my initial observations:

* Running {{LongStreamingTest}} on my laptop went from 24/25 seconds on trunk 
HEAD to 22/23 seconds with the patch applied. So not quite 25% improvement 
unfortunately. I wonder if the reason is because I'm running on an hybrid HDD 
rather than SSD. Would it be possible to collect a few runs and report the 
average and standard deviation? Flight Recorder profiles for trunk and for the 
patch would also be useful. I've put the full output of my test runs at end for 
your reference.

* At the moment we have dependencies on Netty where we would expect to find 
them: in the transport package, {{NativeTransportService}}, the 
{{QueryOptions}} and {{ResultSet}} codecs and {{JavaDriverClient}}. With this 
patch, we will introduce dependencies on Netty {{FastThreadLocal}} and 
{{Recycler}} pretty much everywhere in Cassandra. Before we do this, I would 
like to make sure this is justifiable and I would probably want the opinion of 
one more committer with more experience than me. To start with however, do we 
have a micro benchmark comparing Netty {{FastThreadLocal}} and the JDK 
{{ThreadLocal}}? I'm also not convinced that the Netty recycler is as optimized 
as it can be. I understand that it can be very time consuming to implement an 
optimized pool of objects, but perhaps we should at least produce something 
quickly based on {{ThreadLocal}} and benchmark it against the Netty recycler, 
unless we already have sufficient evidence in favor of the Netty recycler.

* Should we perhaps make recyclable objects ref counted, at least for debugging 
purposes when {{Ref.DEBUG_ENABLED}} is true?

Here are some nits:

* I don't think {{wrap}} in {{ClosableIterable}} is used anywhere.
* In {{StreamingHistogram}} at line 75, {{LongAdder}} also doesn't seem used.
* Most imports with wildcards were expanded, I'm not sure if we care about this 
and if we are in favor of one approach or the other.

That's it for now, I hope to have more detailed observations during the next 
pass.


Here is the output of running {{LongStreamingTest}} on my laptop:

{code}
Run from Intellij:
==================

With patch:
===========

ERROR 04:07:38 Writer finished after 28 seconds....
ERROR 04:07:38 File : /tmp/1461816430211-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR 04:08:01 Finished Streaming in 23.32 seconds: 21.62 Mb/sec
ERROR 04:08:24 Finished Streaming in 22.13 seconds: 22.77 Mb/sec
ERROR 04:09:06 Finished Compacting in 42.16 seconds: 23.91 Mb/sec


Without patch:
===============

ERROR 04:13:13 Writer finished after 27 seconds....
ERROR 04:13:13 File : /tmp/1461816765852-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR 04:13:38 Finished Streaming in 24.87 seconds: 19.63 Mb/sec
ERROR 04:14:02 Finished Streaming in 24.17 seconds: 20.19 Mb/sec
ERROR 04:14:43 Finished Compacting in 41.32 seconds: 23.82 Mb/sec

Run from the command line:
==========================

With patch:
===========
ERROR [main] 2016-04-28 12:25:12,394 Writer finished after 28 seconds....
ERROR [main] 2016-04-28 12:25:12,395 File : 
/tmp/1461817483899-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-04-28 12:25:35,122 Finished Streaming in 22.73 seconds: 21.83 
Mb/sec
ERROR [main] 2016-04-28 12:25:57,284 Finished Streaming in 22.16 seconds: 22.38 
Mb/sec
ERROR [main] 2016-04-28 12:26:38,817 Finished Compacting in 41.53 seconds: 
24.08 Mb/sec


Without patch:
==============
ERROR [main] 2016-04-28 12:19:51,580 Writer finished after 26 seconds....
ERROR [main] 2016-04-28 12:19:51,580 File : 
/tmp/1461817165548-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-04-28 12:20:17,042 Finished Streaming in 25.46 seconds: 19.17 
Mb/sec
ERROR [main] 2016-04-28 12:20:41,087 Finished Streaming in 24.04 seconds: 20.30 
Mb/sec
ERROR [main] 2016-04-28 12:21:22,610 Finished Compacting in 41.52 seconds: 
23.51 Mb/sec
{code}

> Bootstrap outgoing streaming speeds are much slower than during repair
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-9766
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>         Environment: Cassandra 2.1.2. more details in the pdf attached 
>            Reporter: Alexei K
>            Assignee: T Jake Luciani
>              Labels: performance
>             Fix For: 3.x
>
>         Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

Reply via email to