[ 
https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750785#action_12750785
 ] 

Jonathan Ellis commented on CASSANDRA-418:
------------------------------------------

the compaction code relies on the bucketizer to keep files of the same 
compaction-count (a bucket of sstables that have been compacted twice, one of 
sstables that have been compacted 3 times) so that you are never compacting 
sstables of consecutive generations -- all will have even numbers, or all odd.  
something has broken that invariant.

rather than try to band-aid the bucketizer i think making the 
generation-generator more robust is the way to go.  this seems like a flimsy 
property to try to preserve.

my vote would be to simplify: just pick the next monotonically increasing int 
any time we need a new tmp sstable file, whether for flush, compaction, or 
bootstrap.  I.e. via CFS.getTempSSTableFileName, without the extra increment.

the reason historically that FB tried to be fancy is, they were trying to 
optimize away reading older sstables at all if the data being queried was found 
in a newer one.  the "only new sstables get a number from the atomic int, and 
the compactions fit in between" was to preserve this.  (then you sort on the 
generation number and higher ones are always newer.)

but that can't work (see CASSANDRA-223) so we always do a full merge across all 
sstables now.  so we can simplify this safely.

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.5
>
>
> We found that one of our node started getting timeouts for get_slice.  
> Looking further we found that the CFS.ssTables_ references a SStable doesn't 
> exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being 
> compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 
> ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 
> ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes 
> for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number 
> in the list of files to be compacted and adding 1.  In this scenario it is 
> 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and 
> remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update 
> doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to