[
https://issues.apache.org/jira/browse/CASSANDRA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre N. updated CASSANDRA-9323:
---------------------------------
Description:
When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I
tested on a fresh cassandra node (nothing in keyspace, nor tables) with good
hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't be
improved with SSD in this case I think).
When I upload from a different server an sstable I get an average of 3 MB/sec,
in the attached example I managed to get 5 MB/sec, which is still slow.
During the streaming process I noticed that one core of the server is full
CPU, so I think the operation is CPU bound server side. I quickly attached a
sample profiler to the cassandra instance and got the following output :
https://i.imgur.com/IfLc2Ip.png
So, I think, but I may be wrong because it's inaccurate sampling, during
streaming the table is unserialized and reserialized to another sstable, and
that's this unserailize/serialize process which is taking a big amount of CPU,
slowing down the insert speed.
Can someone confirm the bulk load is slow ? I tested also on my computer and
barely reach 1MB/sec
I don't understand the point of totally unserializing the table I just did
build using the CQLSStableWriter (because it's already a long process to build
and sort the table), couldn't it just copy the table from offset X to offset Y
(using index information by example) without unserializing/reserializing it ?
was:
Hi,
When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I
tested on a fresh cassandra node (nothing in keyspace, nor tables) with good
hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't be
improved with SSD in this case I think).
When I upload from a different server an sstable I get an average of 3 MB/sec,
in the attached example I managed to get 5 MB/sec, which is still slow.
During the streaming process I noticed that one core of the server is full
CPU, so I think the operation is CPU bound server side. I quickly attached a
sample profiler to the cassandra instance and got the following output :
https://i.imgur.com/IfLc2Ip.png
So, I think, but I may be wrong because it's inaccurate sampling, during
streaming the table is unserialized and reserialized to another sstable, and
that's this unserailize/serialize process which is taking a big amount of CPU,
slowing down the insert speed.
Can someone confirm the bulk load is slow ? I tested also on my computer and
barely reach 1MB/sec
I don't understand the point of totally unserializing the table I just did
build using the CQLSStableWriter (because it's already a long process to build
and sort the table), couldn't it just copy the table from offset X to offset Y
(using index information by example) without unserializing/reserializing it ?
> Bulk upload is slow
> -------------------
>
> Key: CASSANDRA-9323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9323
> Project: Cassandra
> Issue Type: Bug
> Reporter: Pierre N.
> Attachments: App.java
>
>
> When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I
> tested on a fresh cassandra node (nothing in keyspace, nor tables) with good
> hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't
> be improved with SSD in this case I think).
> When I upload from a different server an sstable I get an average of 3
> MB/sec, in the attached example I managed to get 5 MB/sec, which is still
> slow.
> During the streaming process I noticed that one core of the server is full
> CPU, so I think the operation is CPU bound server side. I quickly attached a
> sample profiler to the cassandra instance and got the following output :
> https://i.imgur.com/IfLc2Ip.png
> So, I think, but I may be wrong because it's inaccurate sampling, during
> streaming the table is unserialized and reserialized to another sstable, and
> that's this unserailize/serialize process which is taking a big amount of
> CPU, slowing down the insert speed.
> Can someone confirm the bulk load is slow ? I tested also on my computer and
> barely reach 1MB/sec
> I don't understand the point of totally unserializing the table I just did
> build using the CQLSStableWriter (because it's already a long process to
> build and sort the table), couldn't it just copy the table from offset X to
> offset Y (using index information by example) without
> unserializing/reserializing it ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)