cassandra-loader is also useful because you don't need to create sstables.
https://github.com/brianmhess/cassandra-loader

Hiro

On Tue, Aug 6, 2019 at 12:15 AM Durity, Sean R
<sean_r_dur...@homedepot.com> wrote:
>
> DataStax has a very fast bulk load tool - dsebulk. Not sure if it is 
> available for open source or not. In my experience so far, I am very 
> impressed with it.
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
> -----Original Message-----
> From: p...@xvalheru.org <p...@xvalheru.org>
> Sent: Saturday, August 3, 2019 6:06 AM
> To: user@cassandra.apache.org
> Cc: Dimo Velev <dimo.ve...@gmail.com>
> Subject: [EXTERNAL] Re: loading big amount of data to Cassandra
>
> Thanks to all,
>
> I'll try the SSTables.
>
> Thanks
>
> Pat
>
> On 2019-08-03 09:54, Dimo Velev wrote:
> > Check out the CQLSSTableWriter java class -
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A&e=
> > . You use it to generate sstables - you need to write a small program
> > for that. You can then stream them over the network using the
> > sstableloader (either use the utility or use the underlying classes to
> > embed it in your program).
> >
> > On 3. Aug 2019, at 07:17, Ayub M <hia...@gmail.com> wrote:
> >
> >> Dimo, how do you generate sstables? Do you mean load data locally on
> >> a cassandra node and use sstableloader?
> >>
> >> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev <dimo.ve...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Batches will actually slow down the process because they mean a
> >>> different thing in C* - as you read they are just grouping changes
> >>> together that you want executed atomically.
> >>>
> >>> Cassandra does not really have indices so that is different than a
> >>> relational DB. However, after writing stuff to Cassandra it
> >>> generates many smallish partitions of the data. These are then
> >>> joined in the background together to improve read performance.
> >>>
> >>> You have two options from my experience:
> >>>
> >>> Option 1: use normal CQL api in async mode. This will create a
> >>> high CPU load on your cluster. Depending on whether that is fine
> >>> for you that might be the easiest solution.
> >>>
> >>> Option 2: generate sstables locally and use the sstableloader to
> >>> upload them into the cluster. The streaming does not generate high
> >>> cpu load so it is a viable option for clusters with other
> >>> operational load.
> >>>
> >>> Option 2 scales with the number of cores of the machine generating
> >>> the sstables. If you can split your data you can generate sstables
> >>> on multiple machines. In contrast, option 1 scales with your
> >>> cluster. If you have a large cluster that is idling, it would be
> >>> better to use option 1.
> >>>
> >>> With both options I was able to write at about 50-100K rows / sec
> >>> on my laptop and local Cassandra. The speed heavily depends on the
> >>> size of your rows.
> >>>
> >>> Back to your question — I guess option2 is similar to what you
> >>> are used to from tools like sqlloader for relational DBMSes
> >>>
> >>> I had a requirement of loading a few 100 mio rows per day into an
> >>> operational cluster so I went with option 2 to offload the cpu
> >>> load to reduce impact on the reading side during the loads.
> >>>
> >>> Cheers,
> >>> Dimo
> >>>
> >>> Sent from my iPad
> >>>
> >>>> On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I need to upload to Cassandra about 7 billions of records. What
> >>> is the best setup of Cassandra for this task? Will usage of batch
> >>> speeds up the upload (I've read somewhere that batch in Cassandra
> >>> is dedicated to atomicity not to speeding up communication)? How
> >>> Cassandra internally works related to indexing? In SQL databases
> >>> when uploading such amount of data is suggested to turn off
> >>> indexing and then turn on. Is something simmillar possible in
> >>> Cassandra?
> >>>>
> >>>> Thanks for all suggestions.
> >>>>
> >>>> Pat
> >>>>
> >>>> ----------------------------------------
> >>>> Freehosting PIPNI - 
> >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
> >>>>
> >>>>
> >>>>
> >>>
> >>
> > ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>>> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>>>
> >>>
> >>>
> >>
> > ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> > ---------------------------------------------------------------------------
> >
> > Freehosting PIPNI - 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
>
> ----------------------------------------
> Freehosting PIPNI - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> ________________________________
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to