[
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268120#comment-15268120
]
Stefania commented on CASSANDRA-9766:
-------------------------------------
It's looking much better without recycling {{BTreeSearchIterator}}:
{code}
grep ERROR
build/test/logs/TEST-org.apache.cassandra.streaming.LongStreamingTest.log
ERROR [main] 2016-05-03 10:37:04,004 SLF4J: stderr
ERROR [main] 2016-05-03 10:37:34,737 Writer finished after 25 seconds....
ERROR [main] 2016-05-03 10:37:34,738 File :
/tmp/1462243029050-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-05-03 10:37:55,165 Finished Streaming in 20.41 seconds: 23.52
Mb/sec
ERROR [main] 2016-05-03 10:38:15,054 Finished Streaming in 19.89 seconds: 24.14
Mb/sec
ERROR [main] 2016-05-03 10:38:56,983 Finished Compacting in 41.93 seconds:
23.09 Mb/sec
{code}
I would suggest leaving {{BTreeSearchIterator}} not recycled. I think it is
quite dangerous to recycle this iterator, see for example
[here|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-81fd7ce7915c147ea84590e25f77ca47R361].
I think we would extend the scope and risk of this patch significantly for
very little gain but feel free to prove me wrong if you want to experiment with
alternative recycling options.
Regarding using our own {{FastThreadLocal}} vs. keeping dependencies to Netty,
I'm really not sure. On one hand I don't want to cause additional work for no
good reason and I don't particularly like duplicating code, but on the other
hand the Netty internal classes, e.g. {{InternalThreadLocalMap}}, could change
at any time. So we could have performance regressions by upgrading Netty for
example. I'm happy either way.
Regarding ref. counting, you're quite right we don't need this, if an object is
not recycled it will be GC-ed.
A few more points:
* Why do we need to allocate cells lazily in {{BTreeRow.Builder}}, do we really
create many of these without ever adding cells to them?
*
[{{dob.recycle()}}|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-c06541855022eca5fd794dd24ff02f89R182]
should be in a finally since {{serializeRowBody()}} can throw.
* I don't understand [this
line|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-ee37e803d70421ce823d42e02620d589R207]:
when the object is recycled, the buffer should be null (from close()) and
indexSamplesSerializedSize should be zero (from create()), so why do we need to
set {{indexOffsets\[columnIndexCount\] = 0}} explicitly?
* {{ColumnIndex.create()}} is only called in BTW.append. It would be nice if we
could somehow attach this object somewhere rather than constantly pushing it
and popping it from the recycler stack. We could just store it in BTW if we
could be sure that BTW.append is not called by multiple threads or maybe have a
queue of these objects in BTW?
> Bootstrap outgoing streaming speeds are much slower than during repair
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
> Issue Type: Improvement
> Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached
> Reporter: Alexei K
> Assignee: T Jake Luciani
> Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment.
> What I've noticed is that we during bootstrap we never go above 12MB/sec
> transmission speeds and also those speeds flat line almost like we're hitting
> some sort of a limit ( this remains true for other tests that I've ran)
> however during the repair we see much higher,variable sending rates. I've
> provided network charts in the attachment as well . Is there an explanation
> for this? Is something wrong with my configuration, or is it a possible bug?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)