[ 
https://issues.apache.org/jira/browse/CASSANDRA-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156755#comment-15156755
 ] 

Stefania commented on CASSANDRA-11053:
--------------------------------------

bq. I would like to repeat the tests on an AWS instance with twice the number 
of cores, to see how much we can scale. 

I've tested with an {{r3.4xlarge}} (16 cores) and with VNODES enabled (256 
tokens per host). These tests have highlighted two further areas of improvement:

* One of the initial optimizations, batching by ring position, is not as 
effective with VNODES due to the increased number of ring positions: 2048 
(8x256) positions means we cannot effectively batch a chunk size of 5000 if all 
partition keys are unique. Therefore I've taken a step back and re-introduced 
batch by replica albeit in a way not to impact the case where we have 
sufficient rows for a given ring position. I've also introduced a new load 
balancing policy to avoid converting a partition key into a token twice (once 
during batching and once when the driver queries the default token aware 
policy).

* I noticed that the worker processes were spending too much time waiting to 
receive data on the larger machine. Having the parent process both read files 
and wait for results was becoming a careful balancing act of how much time to 
dedicate to receiving results: too much time and the processes get starved, too 
little and the progress report lags behind. Since in Python threads run 
sequentially due to the GIL, I decided to move reading of files into a separate 
child process.

I am currently testing these two optimizations. Initial results are 
encouraging, and consistent with > 100k rows/sec. I've however noticed some 
time outs when VNODES are enabled, unless I increase the batch size. I am 
investigating this.

--

bq. Curious, with the introduction of Cython, are you going to give the option 
to build, target a specific platform, or build for many and include extensions 
pre-built?

I've modified _setup.py_ with the option to build in place ({{python setup.py 
build_ext --inplace}}). The idea is to write a blog and give instructions on 
how to boost performance by using an installed cythonized driver ({{export 
CQLSH_NO_BUNDLED=True}}) and building copyutil. It's not my intention to modify 
the existing package for clqsh.

bq. I have visited message coalescing in the driver a couple times, and I 
couldn't make it matter. It was in a more controlled environment than AWS, but 
I was trying to emulate network delay using local intervention.

As far as I have understood from [this 
blog|http://www.datastax.com/dev/blog/performance-doubling-with-message-coalescing],
 it matters mostly on virtualized environments, especially AWS without enhanced 
networking. Another thing to note is that Nagle is not disabled. I did not 
appreciate this when I wrote my earlier comment.

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>         Attachments: copy_from_large_benchmark.txt, 
> copy_from_large_benchmark_2.txt, parent_profile.txt, parent_profile_2.txt, 
> worker_profiles.txt, worker_profiles_2.txt
>
>
> Running COPY from on a large dataset (20G divided in 20M records) revealed 
> two issues:
> * The progress report is incorrect, it is very slow until almost the end of 
> the test at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with 
> a smaller cluster locally (approx 35,000 rows per second). As a comparison, 
> cassandra-stress manages 50,000 rows per second under the same set-up, 
> therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to