[jira] [Comment Edited] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

Lerh Chuan Low (JIRA) Thu, 18 Jan 2018 16:23:13 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331469#comment-16331469
 ]


Lerh Chuan Low edited comment on CASSANDRA-8460 at 1/19/18 12:22 AM:
---------------------------------------------------------------------

Also just bumping this, wondering if you still had plans with it [~jjirsa] or 
[~bdeggleston]? Looks like with the patch you had previously 
(https://github.com/jeffjirsa/cassandra/commit/cc0ab8f733eef63ed0eaea30cc6f471b467c3ec5#diff-f628011a74763c0d0abc369bc8f5762bR126)
 most of the code changes are still applicable. I am willing to give it a go. 

It sounds like we may still be uncertain on how to go about implementing this. 
My original thoughts are with Jeff's, where the archive directories also keep 
an instance of {{XCompactionStrategy}} running for a repaired, unrepaired and 
pending repair set. It will still have to be read and used eventually when 
doing repairs or streaming when adding a new node...so it increasingly looks 
like it will not be ideal to put it into archiving directory and just never 
touch it again, though I'm happy to implement it however people think is better 
because there may be things that are not obvious to me. Flushing won't be aware 
that an archiving directory exists in this case...and will keep flushing to the 
actual {{data_directories}}. Eventually compaction will pick it up and toss it 
into {{archive_data_directories}}, if applicable. 

Just on that though, one thing I am unable to wrap my head around so far is 
whether the archive directory will need to have the same guarantee as a 
multiple data directories setting...so whether a single vnode/token range 
cannot span across it and another directory, and we have to include it when 
distributing token ranges across the multiple directories. 

[~stone] does raise an interesting point though on making it uncoupled from CS 
and using a background periodic task that archives SSTables. I'm guessing in 
this case you would archive based on...SSTable metadata min/max timestamp? Or 
just the last modified of the SSTable files? It will be a YAML property and if 
there is an SSTable with max timestamp behind X days, archive the SSTable? 



was (Author: lerh low):
Also just bumping this, wondering if you still had plans with it [~jjirsa] or 
[~bdeggleston]? Looks like with the patch you had previously 
(https://github.com/jeffjirsa/cassandra/commit/cc0ab8f733eef63ed0eaea30cc6f471b467c3ec5#diff-f628011a74763c0d0abc369bc8f5762bR126)
 most of the code changes are still applicable. I am willing to give it a go. 

It sounds like we may still be uncertain on how to go about implementing this. 
My original thoughts are with Jeff's, where the archive directories also keep 
an instance of {{XCompactionStrategy}} running for a repaired, unrepaired and 
pending repair set. It will still have to be read and used eventually when 
doing repairs or streaming when adding a new node...so it increasingly looks 
like it will not be ideal to put it into archiving directory and just never 
touch it again, though I'm happy to implement it however people think is better 
because there may be things that are not obvious to me. Flushing won't be aware 
that an archiving directory exists in this case...and will keep flushing to the 
actual {{data_directories}}. Eventually compaction will pick it up and toss it 
into {{archive_data_directories}}, if applicable. 

Just on that though, one thing I am unable to wrap my head around so far is 
whether the archive directory will need to have the same guarantee as a 
multiple data directories setting...so whether a single vnode/token range 
cannot span across it and another directory. 

[~stone] does raise an interesting point though on making it uncoupled from CS 
and using a background periodic task that archives SSTables. I'm guessing in 
this case you would archive based on...SSTable metadata min/max timestamp? Or 
just the last modified of the SSTable files? It will be a YAML property and if 
there is an SSTable with max timestamp behind X days, archive the SSTable? 


> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8460
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Major
>              Labels: doc-impacting, dtcs
>             Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

Reply via email to