Launch cqlsh withe the "--debug" option: cqlsh --debug. You should see
which Python driver it is using. My guess is that it is not using the
installed driver, which by default should be Cythonized, but it is still
using the embedded driver.

This is what is shown on my machine for the embedded driver:

Using CQL driver: <module 'cassandra' from
'/home/stefi/git/cstar/cassandra/bin/../lib/cassandra-driver-internal-only-3.7.0.post0-2481531.zip/cassandra-driver-3.7.0.post0-2481531/cassandra/__init__.py'>

And this is for an installed driver:

Using CQL driver: <module 'cassandra' from
'/usr/local/lib/python2.7/dist-packages/cassandra_driver-3.7.1.post0-py2.7-linux-x86_64.egg/cassandra/__init__.py'>

You should also notice that cqlsh takes a little bit longer to start when
using an external driver.

If you want to double check if the driver is Cythonized, any installed
driver should be, you can do "unzip -l" on the egg, for example:

unzip -l
/usr/local/lib/python2.7/dist-packages/cassandra_driver-3.7.1.post0-py2.7-linux-x86_64.egg

You should see files with extension ".pyc" and ".so", they indicate that
the driver was compiled with Cython.

Lastly, regarding point 3, Cython does not ship with C*. However, you don't
need it, unless you want to compile with Cython *pylib/cqlshlib/copyutil.py*
as well, but in my opinion, this is not worth it, the biggest improvement
is when the driver is compiled with Cython, at least from the tests I did.
Therefore, your steps above look correct.

It would be very odd that you see no performance gain with a Cythonized
driver, but like I said performance depends on the schema type, perhaps
your schema has complex types like collections, where the parsing is the
dominant factor - but in this case I would have expected cassandra-loader
to outperform COPY FROM. The parsing is done by *copyutil.py* by the way,
in which case you may well want to Cythonize it too (instructions in the
blog), but because the Python code in *copyutil.py* is not optimized for
Cython, don't expect huge gains.

If you want to move the burden of parsing from the client to the cluster,
you can do so with PREPAREDSTATEMENTS=False, but I only recommend this if
the cluster machines are idle.

Finally, make sure to try out some COPY FROM parameters, especially
NUMPROCESSES
and CHUNKSIZE. For the first parameter, observe the CPU and increase the
number of processes if you have idle CPU on the client, decrease it if the
CPU is blocked (dstat -lvrn 10). As for CHUNKSIZE, because it is in rows,
it may be that for your schema the ideal value is higher or smaller than
5,000, so try different values, such as 500 and 50,000, and see what this
does to performance.



On Tue, Mar 14, 2017 at 9:23 PM, Artur R <ar...@gpnxgroup.com> wrote:

> HI!
>
> I am trying to increase performance of COPY FROM by installing "*Cython
> <http://cython.org/> and libev
> <http://software.schmorp.de/pkg/libev.html> C extensions"* as
> described here: https://www.datastax.com/dev/blog/six-
> parameters-affecting-cqlsh-copy-from-performance.
>
> My steps are as the following:
>
>    1. Install Cassandra 3.10, start it, add keyspace and table
>
>    2. Install C Extensions:
>    sudo apt-get install gcc python-dev
>
>    3. Don't install Cython because as far as I understand it ships with
>    C* 3 by default on step 1), so skip it
>
>    4. Install libev:
>    sudo apt-get install libev4 libev-dev
>
>    5. Reinstall C* driver (because as far as I understand it shipped with
>    C* on step 1):
>    sudo pip install cassandra-driver
>
>    6. export CQLSH_NO_BUNDLED=TRUE
>
>    7. cqlsh to node and try COPY FROM
>
> And after all these steps above performance of COPY FROM is the same as
> before.
> I tested it with single node cluster and with multiple nodes cluster - it
> doesn't impact on performance.
> However, I see that COPY FROM is CPU bounded on my machines, so these
> steps should definitely increase the performance.
>
>
> The question: what I did wrong? Maybe some step is missed.
> How to check that COPY FROM really uses "*Cython
> <http://cython.org/> and libev
> <http://software.schmorp.de/pkg/libev.html> C extensions"* ?
>



-- 

<http://www.datastax.com/>

STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]
<http://www.datastax.com/cloud-applications>

Reply via email to