[jira] Updated: (CASSANDRA-226) Make time-sorted CFs behave consistently

Jonathan Ellis (JIRA) Fri, 12 Jun 2009 21:25:35 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis updated CASSANDRA-226:
-------------------------------------

    Description: 
Time-sorted CFs are indexed and sorted by time in the SSTable.  So lookup by 
name is inefficient, and compaction sucks, because we combine by name even 
though we sort by time. (CASSANDRA-16)

The right fix is, for time-sorted columns we need to recognize that the right 
"key" as it were is (time, name) not name.  We should allow slice by time and 
lookup by time or time,name but not just name.  (Similarly, we should not allow 
lookup by time on name-sorted CFs.)  This means that name will not necessarily 
be unique in time-sorted CFs but that is the right behavior!  Time-based CFs 
are more like appending to a list than putting to a map.

(As part of this we could track the most recent append in time-sorted CFs to 
prevent the bug in CASSANDRA-223.  I am not sure that is the "right" way to go 
but I do mention it as a possibility that this change enables.)

In the code, this will allow more regular treatment of CFs and less special 
casing.  (It will also allow using only a single map for memtable columns, 
reducing memory usage -- see CASSANDRA-51.)

Implementation notes: we may be able to represent both types of CF with a 
pluggable Comparator + Indexer combination, which would solve CASSANDRA-185 and 
CASSANDRA-189 at the same time.

  was:
Time-sorted CFs are indexed and sorted by time in the SSTable.  So lookup by 
name is inefficient, and compaction sucks, because we combine by name even 
though we sort by time.

The right fix is, for time-sorted columns we need to recognize that the right 
"key" as it were is (time, name) not name.  We should allow slice by time and 
lookup by time or time,name but not just name.  (Similarly, we should not allow 
lookup by time on name-sorted CFs.)  This means that name will not necessarily 
be unique in time-sorted CFs but that is the right behavior!  Time-based CFs 
are more like appending to a list than putting to a map.

(As part of this we could track the most recent append in time-sorted CFs to 
prevent the bug in CASSANDRA-223.  I am not sure that is the "right" way to go 
but I do mention it as a possibility that this change enables.)

In the code, this will allow more regular treatment of CFs and less special 
casing.  (It will also allow using only a single map for memtable columns, 
reducing memory usage -- see CASSANDRA-51.)

Implementation notes: we may be able to represent both types of CF with a 
pluggable Comparator + Indexer combination, which would solve CASSANDRA-185 and 
CASSANDRA-189 at the same time.


> Make time-sorted CFs behave consistently
> ----------------------------------------
>
>                 Key: CASSANDRA-226
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-226
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> Time-sorted CFs are indexed and sorted by time in the SSTable.  So lookup by 
> name is inefficient, and compaction sucks, because we combine by name even 
> though we sort by time. (CASSANDRA-16)
> The right fix is, for time-sorted columns we need to recognize that the right 
> "key" as it were is (time, name) not name.  We should allow slice by time and 
> lookup by time or time,name but not just name.  (Similarly, we should not 
> allow lookup by time on name-sorted CFs.)  This means that name will not 
> necessarily be unique in time-sorted CFs but that is the right behavior!  
> Time-based CFs are more like appending to a list than putting to a map.
> (As part of this we could track the most recent append in time-sorted CFs to 
> prevent the bug in CASSANDRA-223.  I am not sure that is the "right" way to 
> go but I do mention it as a possibility that this change enables.)
> In the code, this will allow more regular treatment of CFs and less special 
> casing.  (It will also allow using only a single map for memtable columns, 
> reducing memory usage -- see CASSANDRA-51.)
> Implementation notes: we may be able to represent both types of CF with a 
> pluggable Comparator + Indexer combination, which would solve CASSANDRA-185 
> and CASSANDRA-189 at the same time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-226) Make time-sorted CFs behave consistently

Reply via email to