[
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955515#comment-14955515
]
Tyler Hobbs commented on CASSANDRA-9304:
----------------------------------------
The new code comments are very nice, thank you for putting those in. The rest
of the changes look pretty good to me as well.
bq. I removed StringIO.StringIO, I believe io.StringIO is equivalent and
preferred to StringIO.StringIO? If so, # noqa doesn't seem to be required, the
first E402 is at line 117.
It looks like my version of flake8 was old and didn't handle {{try/except}}
style imports well. Ignore that comment :)
However, there are a couple of other legit flake8 warnings:
{noformat}
bin/cqlsh|2216| W806 local variable 'ks' is assigned to but never used
bin/cqlsh|2217| W806 local variable 'cf' is assigned to but never used
{noformat}
I also tested this out with a 1m row insert from stress and was surprised to
see that I got some timeouts on multiple ranges. These were
{{OperationTimedOut}} errors, so it's not immediately clear where the hangup
is. I did notice that the current error handling code loses the original
exception class (which can be useful), so I suggest changing {{err_callback()}}
from:
{code}
self.pipe.send((token_range, Exception(err.message)))
{code}
to
{code}
self.pipe.send((token_range, Exception(err.__class__.__name__ + " - " +
err.message)))
{code}
To avoid the timeouts, I experimented with lowering the page size from 5k to
1k. This did resolve the timeouts for me, and also smoothed the throughput. I
suggest that we lower the page size (by doing {{session.default_fetch_size =
N}}) to 1k just to lower the impact on nodes.
Additionally, we probably want to add some basic timeout recovery. The
{{err_callback()}} could perform exponential backoff for a limited number of
attempts if an {{OperationTimedOut}} is thrown.
To handle coordinator-level timeouts, we could subclass
{{cassandra.policies.RetryPolicy}} with an {{on_read_timeout()}} that performs
exponential backoff for a limited number of attempts. You can pass an instance
of this to the Cluster constructor: {{Cluster(..., retry_policy=foo)}}.
Sorry for the additional work, I just don't want to end up with a {{COPY TO}}
that goes fast enough to hit timeouts without any sort of recourse for users.
That's something that we already have a bit of a problem with for {{COPY FROM}}.
> COPY TO improvements
> --------------------
>
> Key: CASSANDRA-9304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Stefania
> Priority: Minor
> Labels: cqlsh
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love. COPY TO not so much. One obvious
> improvement could be to parallelize reading and writing (write one page of
> data while fetching the next).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)