[ 
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177068#comment-15177068
 ] 

Stefania commented on CASSANDRA-11053:
--------------------------------------

Thank you for the latest review. 

Unfortunately there was one more small problem; I noticed it on Windows but it 
is actually happening on Linux too. If a child process crashes, 
{{import_records}} will not terminate because the parent process is unable to 
get the lock required to write termination messages to the pipes. The reason is 
that the feeder process is hanging on a send and not releasing the lock. To fix 
this properly, we would have to introduce a bounded semaphore to keep track of 
how many messages are in transit on a pipe. However, since the problem only 
occurs when a child process crashes, and in this case we just want to 
terminate, I simply added a workaround to avoid sending termination messages to 
processes if at least one has crashed. In this case the processes will simply 
terminate. The only consequence should be that any profiling results won't be 
available. 

Please check [this 
commit|https://github.com/stef1927/cassandra/commit/7186cf803fe6cff126b310d7b7785623688b9aa4].

I've restarted CI on all branches.

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, 
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, 
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed 
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of 
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with 
> a smaller cluster locally (approx 35,000 rows per second). As a comparison, 
> cassandra-stress manages 50,000 rows per second under the same set-up, 
> therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to