[ 
https://issues.apache.org/jira/browse/CASSANDRA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268115#comment-13268115
 ] 

Thorkild Stray  commented on CASSANDRA-4205:
--------------------------------------------

The original description is wrong (as we now know). Backporting the tool from 
4211 and testing shows that the SSTables do indeed have the correct settings, 
and everything is updated correctly during the various 
compaction/upgradesstables etc.

After further debugging I found that the reason why I was hitting all the 
SSTables was due to the CollationController falling back to calling 
collectAllData(). This was due to the filter not being an instance of 
NamesQueryFilter:


CollationController.java:
{noformat}

public ColumnFamily getTopLevelColumns()
    {
        return filter.filter instanceof NamesQueryFilter
               && (cfs.metadata.cfType == ColumnFamilyType.Standard || 
filter.path.superColumnName != null)
               && cfs.metadata.getDefaultValidator() != 
CounterColumnType.instance
               ? collectTimeOrderedData()
               : collectAllData();
    }
{noformat}

This makes sense, but I struggled to find out why that happened on some CFs, 
and not on other. After tracing it, I found that the trigger is whether or not 
the CF has the row-cache enabled. If the row-cache is enabled, 
ColumnFamilyStore.cacheRow(..) is called, and if you then get a cache-miss, 
it'll end up calling CoumnFamilyStore.getTopLevelColumns(...), which again 
creates a new CollationController with an IdentityFilter filter as argument, 
and thus end up using collectAllData().

In our case, we have a lot of cache-misses since the update frequency is so 
high and often we only read an entry once before it is replaced. 

Testing with 'stress' before and after showed it moved from using all sstables 
for every query, to 100% reads satisfied with a single SSTable read (according 
to cfhistograms).

I tested a preliminary configuration with row-cache in production, and it 
looked like a lot more of the queries were served by a single sstable. We still 
have stragglers using more, but that can easily be because of the randomization 
size-tiered compaction will lead to in this case. 

I don't see an easy way to fix this properly, though, without changing the row 
cache to be a row+filter cache (covered by CASSANDRA-1956 it seems). Since you 
need to load the whole row into the row cache, you can't use named columns. 
                
> SSTables are not updated with max timestamp on upgradesstables/compaction 
> leading to non-optimal performance.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4205
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4205
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Thorkild Stray 
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 1.0.10, 1.1.1
>
>         Attachments: 4205.txt
>
>
> We upgraded from 0.7.9 to 1.0.7 on a cluster with a heavy update load. After 
> converting all the reads to named column reads instead of get_slice calls, we 
> noticed that we still weren't getting the performance improvements 
> implemented in CASSANDRA-2498. A single named column read was still touching 
> multiple SSTables according to nodetool cfhistograms. 
> To verify whether or not this was a reporting issue or a real issue, we ran 
> multiple tests with stress and noticed that it worked as expected. After 
> changing stress so that it ran the read/write test directly in the CF having 
> issues (3 times stress & flush), we noticed that stress also touched multiple 
> SSTables (according to cfhistograms).
> So, the root of the problem is "something" left over from our pre-1.0 days. 
> All SSTables were upgraded with upgradesstables, and have been written and 
> compacted many times since the upgrade (4 months ago). The usage pattern for 
> this CF is that it is constantly read and updated (overwritten), but no 
> deletes. 
> After discussing the problem with Brandon Williams on #cassandra, it seems 
> the problem might be because a max timestamp has never been written for the 
> old SSTables that were upgraded from pre 1.0. They have only been compacted, 
> and the max timestamp is not recorded during compactions. 
> A suggested fix is to special case this in upgradesstables so that a max 
> timestamp always exists for all SSTables. 
> {panel}
> 06:08 < driftx> thorkild_: tx.  The thing is we don't record the max 
> timestamp on compactions, but we can do it specially for upgradesstables.
> 06:08 < driftx> so, nothing in... nothing out.
> 06:10 < thorkild_> driftx: ah, so when you upgrade from before the metadata 
> was written, and that data is only feed through upgradesstables and 
> compactions -> never properly written?
> 06:10 < thorkild_> that makes sense.
> 06:11 < driftx> right, we never create it, we just reuse it :(
> {panel}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to