Pierre N. created CASSANDRA-9323:
------------------------------------

             Summary: Bulk upload is slow
                 Key: CASSANDRA-9323
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9323
             Project: Cassandra
          Issue Type: Bug
            Reporter: Pierre N.
         Attachments: App.java

Hi, 

When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I 
tested on a fresh cassandra node (nothing in keyspace, nor tables) with good 
hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't be 
improved with SSD in this case I think). 

When I upload from a different server an sstable I get an average of 3 MB/sec, 
in the attached example I managed to get 5 MB/sec, which is still slow.

During the streaming process  I noticed that one core of the server is full 
CPU, so I think the operation is CPU bound server side. I quickly attached a 
sample profiler to the cassandra instance and got the following output : 

https://i.imgur.com/IfLc2Ip.png

So, I think, but I may be wrong because it's inaccurate sampling, during 
streaming the table is unserialized and reserialized to another sstable, and 
that's this unserailize/serialize process which is taking a big amount of CPU, 
slowing down the insert speed.

Can someone confirm the bulk load is slow ? I tested also on my computer and 
barely reach 1MB/sec 

I don't understand the point of totally unserializing the table I just did 
build using the CQLSStableWriter (because it's already a long process to build 
and sort the table), couldn't it just copy the table from offset X to offset Y 
(using index information by example) without unserializing/reserializing it ?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to