[ 
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966519#comment-14966519
 ] 

Stefania commented on CASSANDRA-9304:
-------------------------------------

The latest changes are ready for review.

In the end I've decided to implement exponential back-off only for server side 
timeouts, that is only in the retry policy. For driver timeouts, 
{{OperationTimedOut}}, it is problematic to retry because not only do we need 
to keep track of how many pages we've already received, but we may also 
retrieve more data from the server. This results in duplicated data. So what I 
did instead is to increase the timeout with the page size (10 seconds per 1000 
entries in the page size at the moment but maybe this is a bit too much). This 
should eliminate driver side timeouts that result in more data being received 
from the server. {{OperationTimedOut}}, if still received, would then signal a 
real connection problem. In this case, it is the parent process that may 
resubmit the same token range later on, up to a maximum number of times and 
provided that we have received no data yet. This is true for any errors 
reported for a range by a worker process. If we have already received data for 
that range, I decided against retrying to avoid duplication of data. I hope 
this makes sense, let me know if you do have other preferences on how to 
implement the back-off and retry mechanism.

I've also done the following:

* enhanced debug messages and error logging 
* fixed COPY command completions
* added monitoring of child processes in case they die without sending the 
termination flag on the pipe
* fixed possible concurrent access to {{ExportSession.jobs}}

Still to do:

* Moving the code to a separate file
* Testing on Windows

> COPY TO improvements
> --------------------
>
>                 Key: CASSANDRA-9304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Minor
>              Labels: cqlsh
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love.  COPY TO not so much.  One obvious 
> improvement could be to parallelize reading and writing (write one page of 
> data while fetching the next).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to