Re: Cqlsh copy command on a larger data set

Alex Ott Thu, 16 Jul 2020 11:21:09 -0700

if you didn't export TTL explicitly, and didn't load it back, then you'll
get not expirable data.


On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> In tried verify metadata, In case of writetime it is setting it as insert
> time but the TTL value is showing as null. Is this expected? Does this mean
> this record will never expire after the insert?
> Is there any alternative to preserve the TTL ?
>
> In the new Table inserted with Cqlsh and Dsbulk
> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>
>  ttl(secret)
> --------------
>          null
>          null
>
> (2 rows)
>
> In the old table where the data was written from application
>
> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>
>  ttl(secret)
> --------------------
>          4517461
>          4525958
>
> (2 rows)
>
> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> thank you
>>
>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>> russell.spit...@gmail.com> wrote:
>>
>>> Alex is referring to the "writetime" and "tttl" values for each cell.
>>> Most tools copy via CQL writes and don't by default copy those previous
>>> writetime and ttl values and instead just give a new writetime value which
>>> matches the copy time rather than initial insert time.
>>>
>>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
>>>> Hello Alex,
>>>>
>>>>
>>>>    - use DSBulk - it's a very effective tool for unloading & loading
>>>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data to 
>>>> save
>>>>    disk space (see blog links below for more details).  But the *preserving
>>>>    metadata* could be a problem.
>>>>
>>>> Here what exactly do you mean by "preserving metadata" ? would you
>>>> mind explaining?
>>>>
>>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>>> Thank you for the suggestions
>>>>>
>>>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:
>>>>>
>>>>>> CQLSH definitely won't work for that amount of data, so you need to
>>>>>> use other tools.
>>>>>>
>>>>>> But before selecting them, you need to define requirements. For
>>>>>> example:
>>>>>>
>>>>>>    1. Are you copying the data into tables with exactly the same
>>>>>>    structure?
>>>>>>    2. Do you need to preserve metadata, like, writetime & TTL?
>>>>>>
>>>>>> Depending on that, you may have following choices:
>>>>>>
>>>>>>    - use sstableloader - it will preserve all metadata, like, ttl
>>>>>>    and writetime. You just need to copy SSTable files, or stream 
>>>>>> directly from
>>>>>>    the source cluster.  But this will require copying of data into 
>>>>>> tables with
>>>>>>    exactly same structure (and in case of UDTs, the keyspace names 
>>>>>> should be
>>>>>>    the same)
>>>>>>    - use DSBulk - it's a very effective tool for unloading & loading
>>>>>>    data from/to Cassandra/DSE. Use zstd compression for offloaded data 
>>>>>> to save
>>>>>>    disk space (see blog links below for more details).  But the 
>>>>>> preserving
>>>>>>    metadata could be a problem.
>>>>>>    - use Spark + Spark Cassandra Connector. But also, preserving the
>>>>>>    metadata is not an easy task, and requires programming to handle all 
>>>>>> edge
>>>>>>    cases (see https://datastax-oss.atlassian.net/browse/SPARKC-596
>>>>>>    for details)
>>>>>>
>>>>>>
>>>>>> blog series on DSBulk:
>>>>>>
>>>>>>    -
>>>>>>    
>>>>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>>>>    -
>>>>>>    
>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>>>>    -
>>>>>>    
>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>>>>    -
>>>>>>    https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>>>>    -
>>>>>>    https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>>>>    -
>>>>>>    
>>>>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I would like to copy some data from one cassandra cluster to another
>>>>>>> cassandra cluster using the CQLSH copy command. Is this the good 
>>>>>>> approach
>>>>>>> if the dataset size on the source cluster is very high(500G - 1TB)? If 
>>>>>>> not
>>>>>>> what is the safe approach? and are there any limitations/known issues to
>>>>>>> keep in mind before attempting this?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> With best wishes,                    Alex Ott
>>>>>> http://alexott.net/
>>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>>
>>>>>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Cqlsh copy command on a larger data set

Reply via email to