[jira] [Commented] (CASSANDRA-9304) COPY TO improvements

Tyler Hobbs (JIRA) Tue, 13 Oct 2015 12:32:42 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955515#comment-14955515
 ]


Tyler Hobbs commented on CASSANDRA-9304:
----------------------------------------

The new code comments are very nice, thank you for putting those in.  The rest 
of the changes look pretty good to me as well.

bq.  I removed StringIO.StringIO, I believe io.StringIO is equivalent and 
preferred to StringIO.StringIO? If so, # noqa doesn't seem to be required, the 
first E402 is at line 117.

It looks like my version of flake8 was old and didn't handle {{try/except}} 
style imports well.  Ignore that comment :)

However, there are a couple of other legit flake8 warnings:

{noformat}
bin/cqlsh|2216| W806 local variable 'ks' is assigned to but never used
bin/cqlsh|2217| W806 local variable 'cf' is assigned to but never used
{noformat}

I also tested this out with a 1m row insert from stress and was surprised to 
see that I got some timeouts on multiple ranges.  These were 
{{OperationTimedOut}} errors, so it's not immediately clear where the hangup 
is.  I did notice that the current error handling code loses the original 
exception class (which can be useful), so I suggest changing {{err_callback()}} 
from:

{code}
self.pipe.send((token_range, Exception(err.message)))
{code}

to

{code}
self.pipe.send((token_range, Exception(err.__class__.__name__ + " - " + 
err.message)))
{code}

To avoid the timeouts, I experimented with lowering the page size from 5k to 
1k.  This did resolve the timeouts for me, and also smoothed the throughput.  I 
suggest that we lower the page size (by doing {{session.default_fetch_size = 
N}}) to 1k just to lower the impact on nodes.

Additionally, we probably want to add some basic timeout recovery.  The 
{{err_callback()}} could perform exponential backoff for a limited number of 
attempts if an {{OperationTimedOut}} is thrown.

To handle coordinator-level timeouts, we could subclass 
{{cassandra.policies.RetryPolicy}} with an {{on_read_timeout()}} that performs 
exponential backoff for a limited number of attempts.  You can pass an instance 
of this to the Cluster constructor: {{Cluster(..., retry_policy=foo)}}.

Sorry for the additional work, I just don't want to end up with a {{COPY TO}} 
that goes fast enough to hit timeouts without any sort of recourse for users.  
That's something that we already have a bit of a problem with for {{COPY FROM}}.

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious 
> improvement could be to parallelize reading and writing (write one page of 
> data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9304) COPY TO improvements

Reply via email to