Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "ArchitectureSSTable" page has been changed by Chris Goffinet.
http://wiki.apache.org/cassandra/ArchitectureSSTable

--------------------------------------------------

New page:
= SSTable Overview =
DRAFT. Notes on documenting how SSTables work in Cassandra (data format, 
indexing, serialization, searching)

SSTables have 3 seperate files created, and are per column-family.

 1. Bloom Filter
 1. Index
 1. Data

When adding a new key to an SSTable here are the steps it goes through. All 
keys are sorted before writing.

 1. Serialize Index
  1. Sort columns for key
  1. Serialize columns bloom filter
   1. Loop through columns and subcolumns that make up for column family
    1. Build sum for columnCount by column getObjectCount (includes getting 
subcolumn counts for super columns)
    1. Create bloom filter with column count
    1. Loop through columns (again) and add column name to bloom filter
     1. If super column detected, loop through subcolumns and add column name
   1. Write bloom filter hash count (int)
   1. Write serialized bloom filter length (int)
   1. Write serialized bytes of bloom filter
  1. Start indexing based on column family comparator
   1. If columns empty write integer zero, return
   1. Iterate over columns until getColumnIndexSize() is exceeded (default is 
64KB)
    1. Construct new IndexInfo that consists of last column before exceeded, 
existing column name, startPosition and endPosition - startPosition
   1. Write size of indexSizeInBytes (int)
   1. Serialize each IndexInfo object - (firstname is last colum name before 
exceeded, and lastname is the existing column name)
    1. Write byte firstname - (length >> 8) & 0xFF
    1. Write byte firstname - (length & 0xFF)
    1. Write byte firstname
    1. Write byte lastname - (length >> 8) & 0xFF
    1. Write byte lastname - (length & 0xFF)
    1. Write byte lastname
    1. Write long startPosition
    1. Write long endPosition - startPosition
 1. Serialize Data
  1. Write columnFamily localDeletionTime (int)
  1. Write columnFamily markedForDeleteAt (long)
  1. Sort columns
  1. Write the number of columns (int)
  1. Determine Column Serializer and Serialize Column
   1. Determine length of column name  as length
   1. Write byte - (length >> 8) & 0xFF
   1. Write byte - length & 0xFF
   1. Write byte of column name
   1. Write boolean isMarkedForDelete
   1. Write long timestamp
   1. Write column value length (int)
   1. Write column value as byte
 1. Write to SSTable Data File
  1. Write out row key in UTF, this is based on partitioner
   1. Random Partitioner
    1. key token + DELIMITER + key name
    1. Delimiter is colon
  1. Write size of row value (int)
  1. Write byte of row value
 1. Write SSTable Bloom Filter and SSTable Index
  1. Add to bloom filter disk key based on partitioner
   1. Random Partitioner
    1. key token + DELIMITER + key name
    1. Delimiter is colon
  1. Write disk key to SSTable Index file (UTF)
  1. Write file position before (Write to SSTable Data File) (int)

Reply via email to