[
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966519#comment-14966519
]
Stefania commented on CASSANDRA-9304:
-------------------------------------
The latest changes are ready for review.
In the end I've decided to implement exponential back-off only for server side
timeouts, that is only in the retry policy. For driver timeouts,
{{OperationTimedOut}}, it is problematic to retry because not only do we need
to keep track of how many pages we've already received, but we may also
retrieve more data from the server. This results in duplicated data. So what I
did instead is to increase the timeout with the page size (10 seconds per 1000
entries in the page size at the moment but maybe this is a bit too much). This
should eliminate driver side timeouts that result in more data being received
from the server. {{OperationTimedOut}}, if still received, would then signal a
real connection problem. In this case, it is the parent process that may
resubmit the same token range later on, up to a maximum number of times and
provided that we have received no data yet. This is true for any errors
reported for a range by a worker process. If we have already received data for
that range, I decided against retrying to avoid duplication of data. I hope
this makes sense, let me know if you do have other preferences on how to
implement the back-off and retry mechanism.
I've also done the following:
* enhanced debug messages and error logging
* fixed COPY command completions
* added monitoring of child processes in case they die without sending the
termination flag on the pipe
* fixed possible concurrent access to {{ExportSession.jobs}}
Still to do:
* Moving the code to a separate file
* Testing on Windows
> COPY TO improvements
> --------------------
>
> Key: CASSANDRA-9304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Stefania
> Priority: Minor
> Labels: cqlsh
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> COPY FROM has gotten a lot of love. COPY TO not so much. One obvious
> improvement could be to parallelize reading and writing (write one page of
> data while fetching the next).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)