COPY FROM performance

2017-03-14 Thread Artur R
HI! I am trying to increase performance of COPY FROM by installing "*Cython and libev C extensions"* as described here: https://www.datastax.com/dev/blog/six-parameters-affecting-cqlsh-copy-from-performance . My steps are as the

Re: HELP with bulk loading

2017-03-14 Thread Artur R
ssandra-loader. >> >> Depending on your schema, one or the other may do slightly better. >> >> On Fri, Mar 10, 2017 at 8:11 AM, Ryan Svihla <r...@foundev.pro> wrote: >> >>> I suggest using cassandra loader >>> >>> https://github.com

How to obtain partition size

2017-03-13 Thread Artur R
Hello! I can't find where C* stores information about partitions size (if stores it at all). So, the questions; 1. How to obtain the size (in rows or in bytes - doesn't matter) of some particular partition? I know that there is *system.size_estimates* table with *mean_partition_size*, but it's

HELP with bulk loading

2017-03-09 Thread Artur R
Hello all! There are ~500gb of CSV files and I am trying to find the way how to upload them to C* table (new empty C* cluster of 3 nodes, replication factor 2) within reasonable time (say, 10 hours using 3-4 instance of c3.8xlarge EC2 nodes). My first impulse was to use CQLSSTableWriter, but it