[ 
https://issues.apache.org/jira/browse/CASSANDRA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903877#comment-13903877
 ] 

Daniel Shelepov commented on CASSANDRA-1983:
--------------------------------------------

Notes so far:

- sstable filenames are controlled by the io/sstable/Descriptor class, which 
encapsulates a few parameters including "generation" -- the increasing integer 
in question.
- dropping generation in favor of a uuid seems questionable, given that 
generation is used by a wide variety of clients in the codebase.  So the most 
likely approach is uuid + generation side by side.
- using the host id as the uuid is easy conceptually, but will violate 
layering, because code in io will start to depend on db and/or service.  Plus 
there is potential bootstrapping problem where system sstables need to be 
initialized early on during boot, and it's not clear whether the unique host id 
is available early enough to feed into system sstable descriptors.
- random uuids are also tricky, because sstable names will no longer be 
discoverable without directory lookups.  Some code (particularly in unit tests) 
leans on the ability to synthesize sstable names without touching the 
filesystem.  It's possible to persist these uuids in one of the system tables, 
but it will have to be a local table, and, regardless, changing system schema 
can make this a breaking change.

I haven't yet found a cost-effective fix that would involve actually modifying 
the existing naming scheme.

The latest idea I have is to create a directory that will hold symlinks to real 
sstables (symlinks are available in Java 7).  Symlink names will contain the 
UUIDs.  The only extra piece of code would be creating and tearing down 
symlinks when real sstables are created and deleted.  End users could then 
access sstables through this symlink directory whenever doing related 
maintenance. The last piece would be making sure that appropriate clients, such 
as the compactor, can consume sstables with and without UUIDs.

I'll work on this some more tomorrow, but it'll probably spill until next week 
(or later).

Comments welcome.

> Make sstable filenames contain a UUID instead of increasing integer
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1983
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: David King
>            Priority: Minor
>
> sstable filenames look like CFName-1569-Index.db, containing an integer for 
> uniqueness. This makes it possible (however unlikely) that the integer could 
> overflow, which could be a problem. It also makes it difficult to collapse 
> multiple nodes into a single one with rsync. I do this occasionally for 
> testing: I'll copy our 20 node cluster into only 3 nodes by copying all of 
> the data files and running cleanup; at present this requires a manual step of 
> uniqifying the overlapping sstable names. If instead of an incrementing 
> integer, it would be handy if these contained a UUID or somesuch that 
> guarantees uniqueness across the cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to