Spark is scalable to as many nodes as you want and could be collocated with the 
data nodes — sstableloader wont be as performant for larger datasets. Although 
it can be run in parallel on different nodes I don’t believe it to be as fault 
tolerant.

If you have to do it continuously I would even think about leveraging Kafka as 
the transport layer and using Kafka Connect. It brings other tooling to get 
data into Cassandra from a variety of sources.

Rahul
On Aug 6, 2018, 3:16 PM -0400, srimugunthan dhandapani 
<srimugunthan.dhandap...@gmail.com>, wrote:
> Hi all,
> We have data that gets filled into Hive/ presto  every few hours.
> We want that data to be transferred to cassandra tables.
> What are some of the high performance ETL options for transferring data 
> between hive  or presto into cassandra?
>
> Also does anybody have any performance numbers comparing
> - loading data from S3 to cassandra using SStableloader
> - and loading data from S3 to cassandra using other means (like spark-api)?
>
> Thanks,
> mugunthan

Reply via email to