With DataStax bulkloader you can only export from a Cassandra table but not 
import into Cassandra (only load into DSE cluster). 

And +1 on the confusing name of batches ... yes it’s for writes but not for 
loading data. 

Amanda 

> On Aug 5, 2019, at 8:14 AM, Durity, Sean R <sean_r_dur...@homedepot.com> 
> wrote:
> 
> DataStax has a very fast bulk load tool - dsebulk. Not sure if it is 
> available for open source or not. In my experience so far, I am very 
> impressed with it.
> 
> 
> 
> Sean Durity – Staff Systems Engineer, Cassandra
> 
> -----Original Message-----
> From: p...@xvalheru.org <p...@xvalheru.org>
> Sent: Saturday, August 3, 2019 6:06 AM
> To: user@cassandra.apache.org
> Cc: Dimo Velev <dimo.ve...@gmail.com>
> Subject: [EXTERNAL] Re: loading big amount of data to Cassandra
> 
> Thanks to all,
> 
> I'll try the SSTables.
> 
> Thanks
> 
> Pat
> 
>> On 2019-08-03 09:54, Dimo Velev wrote:
>> Check out the CQLSSTableWriter java class -
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A&e=
>> . You use it to generate sstables - you need to write a small program
>> for that. You can then stream them over the network using the
>> sstableloader (either use the utility or use the underlying classes to
>> embed it in your program).
>> 
>>> On 3. Aug 2019, at 07:17, Ayub M <hia...@gmail.com> wrote:
>>> 
>>> Dimo, how do you generate sstables? Do you mean load data locally on
>>> a cassandra node and use sstableloader?
>>> 
>>> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev <dimo.ve...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Batches will actually slow down the process because they mean a
>>>> different thing in C* - as you read they are just grouping changes
>>>> together that you want executed atomically.
>>>> 
>>>> Cassandra does not really have indices so that is different than a
>>>> relational DB. However, after writing stuff to Cassandra it
>>>> generates many smallish partitions of the data. These are then
>>>> joined in the background together to improve read performance.
>>>> 
>>>> You have two options from my experience:
>>>> 
>>>> Option 1: use normal CQL api in async mode. This will create a
>>>> high CPU load on your cluster. Depending on whether that is fine
>>>> for you that might be the easiest solution.
>>>> 
>>>> Option 2: generate sstables locally and use the sstableloader to
>>>> upload them into the cluster. The streaming does not generate high
>>>> cpu load so it is a viable option for clusters with other
>>>> operational load.
>>>> 
>>>> Option 2 scales with the number of cores of the machine generating
>>>> the sstables. If you can split your data you can generate sstables
>>>> on multiple machines. In contrast, option 1 scales with your
>>>> cluster. If you have a large cluster that is idling, it would be
>>>> better to use option 1.
>>>> 
>>>> With both options I was able to write at about 50-100K rows / sec
>>>> on my laptop and local Cassandra. The speed heavily depends on the
>>>> size of your rows.
>>>> 
>>>> Back to your question — I guess option2 is similar to what you
>>>> are used to from tools like sqlloader for relational DBMSes
>>>> 
>>>> I had a requirement of loading a few 100 mio rows per day into an
>>>> operational cluster so I went with option 2 to offload the cpu
>>>> load to reduce impact on the reading side during the loads.
>>>> 
>>>> Cheers,
>>>> Dimo
>>>> 
>>>> Sent from my iPad
>>>> 
>>>>> On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I need to upload to Cassandra about 7 billions of records. What
>>>> is the best setup of Cassandra for this task? Will usage of batch
>>>> speeds up the upload (I've read somewhere that batch in Cassandra
>>>> is dedicated to atomicity not to speeding up communication)? How
>>>> Cassandra internally works related to indexing? In SQL databases
>>>> when uploading such amount of data is suggested to turn off
>>>> indexing and then turn on. Is something simmillar possible in
>>>> Cassandra?
>>>>> 
>>>>> Thanks for all suggestions.
>>>>> 
>>>>> Pat
>>>>> 
>>>>> ----------------------------------------
>>>>> Freehosting PIPNI - 
>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>> 
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------------
>> 
>> Freehosting PIPNI - 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
> 
> ----------------------------------------
> Freehosting PIPNI - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_&d=DwIDaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA&s=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U&e=
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> 
> ________________________________
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.
> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•È[œÝXœØÜšX™KK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ\Ù\‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃB

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to