[
https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006279#comment-15006279
]
Stefania commented on CASSANDRA-9304:
-------------------------------------
Thank you for the review. Here is the latest update:
bq. In {{write_rows_to_csv()}}, instead of using {{sys.exec_info()\[0\]}}, it
seems like we could just use the caught exception (by doing {{except Exception,
e:}}). Am I missing a reason why that won't work here?
No reason, I changed it, thanks.
{quote}
In {{get_ranges()}}, this line will result in two token ranges using the same
dict, so the count of attempts and rows can become incorrect:
{code}
ranges[(previous, None)] =
{code}
Just using {{.copy()}} on the dict should be fine.
{quote}
Fixed, thank you.
bq. In {{report_error()}}, the {{issubclass()}} check should be
{{isinstance()}} instead.
Changed, thanks.
{quote}
If a child process dies, we don't (and can't) make any adjustments to the
suceeded/failed count. Because the jobs that were queued to that process aren't
resubmitted, this results in the copy process looping endlessly. I see two
possible solutions:
# Track which token ranges we've submitted to each process. When a child
process dies, re-submit any of those token ranges for which we have 0 rows.
# Error the entire copy process
I'm personally okay with option #2, so feel free to go with that if you don't
want to do the work for #1.
{quote}
It's not easy to track which process is handling a range because we are using a
shared queue where each process takes the next job whenever it is ready to
accept another job. So I've opted for option 2 since if a process dies chances
are the other processes will die too.
{quote}
I was able to find these bugs by editing the code to some simple fault
injection in start_job():
{code}
if random.random() > 0.7:
future = session.execute_async("SELECT * FROM badtable")
elif random.random() > 0.99:
sys.exit(1)
else:
future = session.execute_async(query)
{code}
I think it would be a good idea to expose something like this for testing
through an environment variable. That would allow us to easily exercise these
different error scenarios in dtests that are otherwise hard to replicate.
{quote}
Done. I've also created a
[dtest|https://github.com/stef1927/cassandra-dtest/commit/a629352aa4735baeef42aae81c3bb098e45b77db]
but due to the random nature of failure injection there isn't much to check
other than checking that it doesn't time-out. Do you think this test is useful
at all?
I've relaunched one more set of CI jobs, they are still running.
> COPY TO improvements
> --------------------
>
> Key: CASSANDRA-9304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9304
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Assignee: Stefania
> Priority: Minor
> Labels: cqlsh
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> COPY FROM has gotten a lot of love. COPY TO not so much. One obvious
> improvement could be to parallelize reading and writing (write one page of
> data while fetching the next).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)