Thank a lot for the reply, Raj, I understand they are different. But if we define a Batch with UNLOGGED, it will not guarantee the atomic transaction, and become more like a data import tool. According to my knowledge, BATCH statement packs several mutations into one RPC to save time. Similarly, Bulk Loader also pack all the mutations as a SSTable file and (I think) may be able to save lot of time too.
I am interested that, in the coordinator server, are Batch Insert and Bulk Loader the similar thing? I mean are they implemented in the similar way? P.S. I try to randomly insert 1000 rows into a simple table on my laptop as a test. Sync Insert will take almost 2s to finish, but sync batch insert only take like 900ms. It is a huge performance improvement, I wonder is this expected? Also, I used CQLSStableWriter to put these 1000 insertions into a single SSTable file, it costs around 2s to finish on my laptop. Seems to be pretty slow. thanks! - Dong > On Dec 1, 2014, at 2:33 AM, Rajanarayanan Thottuvaikkatumana > <rnambood...@gmail.com> wrote: > > BATCH statement and Bulk Load are totally different things. The BATCH > statement comes in the atomic transaction space which provides a way to make > more than one statements into an atomic unit and bulk loader provides the > ability to bulk load external data into a cluster. Two are totally different > things and cannot be compared. > > Thanks > -Raj > > On 01-Dec-2014, at 4:32 am, Dong Dai <daidon...@gmail.com> wrote: > >> Hi, all, >> >> I have a performance question about the batch insert and bulk load. >> >> According to the documents, to import large volume of data into Cassandra, >> Batch Insert and Bulk Load can both be an option. Using batch insert is >> pretty straightforwards, but there have not been an ‘official’ way to use >> Bulk Load to import the data (in this case, i mean the data was generated >> online). >> >> So, i am thinking first clients use CQLSSTableWriter to create the SSTable >> files, then use “org.apache.cassandra.tools.BulkLoader” to import these >> SSTables into Cassandra directly. >> >> The question is can I expect a better performance using the BulkLoader this >> way comparing with using Batch insert? >> >> I am not so familiar with the implementation of Bulk Load. But i do see a >> huge performance improvement using Batch Insert. Really want to know the >> upper limits of the write performance. Any comment will be helpful, Thanks! >> >> - Dong >> >