With DataStax bulkloader you can only export from a Cassandra table but not import into Cassandra (only load into DSE cluster).
And +1 on the confusing name of batches ... yes it’s for writes but not for loading data. Amanda > On Aug 5, 2019, at 8:14 AM, Durity, Sean R <sean_r_dur...@homedepot.com> > wrote: > > DataStax has a very fast bulk load tool - dsebulk. Not sure if it is > available for open source or not. In my experience so far, I am very > impressed with it. > > > > Sean Durity – Staff Systems Engineer, Cassandra > > -----Original Message----- > From: p...@xvalheru.org <p...@xvalheru.org> > Sent: Saturday, August 3, 2019 6:06 AM > To: user@cassandra.apache.org > Cc: Dimo Velev <dimo.ve...@gmail.com> > Subject: [EXTERNAL] Re: loading big amount of data to Cassandra > > Thanks to all, > > I'll try the SSTables. > > Thanks > > Pat > >> On 2019-08-03 09:54, Dimo Velev wrote: >> Check out the CQLSSTableWriter java class - >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A&e= >> . You use it to generate sstables - you need to write a small program >> for that. You can then stream them over the network using the >> sstableloader (either use the utility or use the underlying classes to >> embed it in your program). >> >>> On 3. Aug 2019, at 07:17, Ayub M <hia...@gmail.com> wrote: >>> >>> Dimo, how do you generate sstables? Do you mean load data locally on >>> a cassandra node and use sstableloader? >>> >>> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev <dimo.ve...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Batches will actually slow down the process because they mean a >>>> different thing in C* - as you read they are just grouping changes >>>> together that you want executed atomically. >>>> >>>> Cassandra does not really have indices so that is different than a >>>> relational DB. However, after writing stuff to Cassandra it >>>> generates many smallish partitions of the data. These are then >>>> joined in the background together to improve read performance. >>>> >>>> You have two options from my experience: >>>> >>>> Option 1: use normal CQL api in async mode. This will create a >>>> high CPU load on your cluster. Depending on whether that is fine >>>> for you that might be the easiest solution. >>>> >>>> Option 2: generate sstables locally and use the sstableloader to >>>> upload them into the cluster. The streaming does not generate high >>>> cpu load so it is a viable option for clusters with other >>>> operational load. >>>> >>>> Option 2 scales with the number of cores of the machine generating >>>> the sstables. If you can split your data you can generate sstables >>>> on multiple machines. In contrast, option 1 scales with your >>>> cluster. If you have a large cluster that is idling, it would be >>>> better to use option 1. >>>> >>>> With both options I was able to write at about 50-100K rows / sec >>>> on my laptop and local Cassandra. The speed heavily depends on the >>>> size of your rows. >>>> >>>> Back to your question — I guess option2 is similar to what you >>>> are used to from tools like sqlloader for relational DBMSes >>>> >>>> I had a requirement of loading a few 100 mio rows per day into an >>>> operational cluster so I went with option 2 to offload the cpu >>>> load to reduce impact on the reading side during the loads. >>>> >>>> Cheers, >>>> Dimo >>>> >>>> Sent from my iPad >>>> >>>>> On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote: >>>>> >>>>> Hi, >>>>> >>>>> I need to upload to Cassandra about 7 billions of records. What >>>> is the best setup of Cassandra for this task? Will usage of batch >>>> speeds up the upload (I've read somewhere that batch in Cassandra >>>> is dedicated to atomicity not to speeding up communication)? How >>>> Cassandra internally works related to indexing? In SQL databases >>>> when uploading such amount of data is suggested to turn off >>>> indexing and then turn on. Is something simmillar possible in >>>> Cassandra? >>>>> >>>>> Thanks for all suggestions. >>>>> >>>>> Pat >>>>> >>>>> ---------------------------------------- >>>>> Freehosting PIPNI - >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e= >>>>> >>>>> >>>>> >>>> >>> >> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>>> >>>> >>>> >>> >> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> --------------------------------------------------------------------------- >> >> Freehosting PIPNI - >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e= > > ---------------------------------------- > Freehosting PIPNI - > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e= > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > ________________________________ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email by > anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be taken > in reliance on it, is prohibited and may be unlawful. When addressed to our > clients any opinions or advice contained in this Email are subject to the > terms and conditions expressed in any applicable governing The Home Depot > terms of business or client engagement letter. The Home Depot disclaims all > responsibility and liability for the accuracy and content of this attachment > and for any damages or losses arising from any inaccuracies, errors, viruses, > e.g., worms, trojan horses, etc., or other items of a destructive nature, > which may be contained in this attachment and shall not be liable for direct, > indirect, consequential or special damages in connection with this e-mail > message or its attachment. > B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•È[œÝXœØÜšX™KK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ\Ù\‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃB --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org