[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

Sylvain Lebresne (Commented) (JIRA) Tue, 13 Dec 2011 15:26:01 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168849#comment-13168849
 ]


Sylvain Lebresne commented on CASSANDRA-3143:
---------------------------------------------

{quote}
bq. Preceding point apart, we would at least need a way to deactivate row 
caching on a per-cf basis. We may also want that for key cache, though this 
seems less critical. My initial idea would be to either have a boolean flag if 
we only want to allow disabling row cache, or some multi-value caches option 
that could be "none", "key_only", "row_only" or "all".

This is going to be moved to the separate task.
{quote}

I'm not a fan of that idea. We just cannot release this without a way to 
deactivate the row cache as this would make the row cache unusable for most 
users. IMHO, that's a good definition of something that should not be moved to 
a separate task.

{quote}
bq. Why does the getRowCacheKeysToSave() option disappeared?

Because we don't control that anymore, rely on cache LRU policy instead.
{quote}

I don't understand how "relying on cache LRU policy" has anything to do with 
that. The initial motivation for that option is that people don't want to 
reload the full extend of the row cache on restart because it takes forever, 
but they don't want to start with cold caches either. I don't see how making 
the cache global changes anything on that. I agree that the number of row cache 
key to save should now be a global option, but I disagree that it should be 
removed.

Otherwise:
* The code around CFS.prepareRowForCaching is weird. First the comment seems to 
suggest that prepareRowForCaching is used exclusively from CacheService while 
it's use below in cacheRow. It also adds a copy of the columns which I don't 
think is necessary since we already copy in MappedFileDataInput.  Overall I'm 
not sure prepareRowForCaching is useful and CacheService.readSavedRowCache 
could use cacheRow directly
* I don't think CacheService.reloadKeyCache works correctly. It only populate 
the cache with fake values that won't get updated unless you reload the 
sstables, which has no reason to happen. I'm fine removing the key cache 
reloading altogether, but as an alternative, why not save the value of the key 
cache too? The thing is, I'm not very comfortable with the current 'two phase' 
key cache loading: if we ever have a bug in the SSTReader.load method, the 
actual pre-loading with -1 values will be harmful, which seems unnecessarily 
fragile. Saving the values on disk would avoid that.
* Talking of the key cache save, the format used by the patch is really really 
not compact. For each key we save the full path to the sstable, which can 
easily be > 50 bytes. Maybe we could associate an int to each descriptor during 
the save and save the association of descriptor -> id separately.  * Still 
worth allowing to chose how may keys to save
* The cache sizings don't take the keys into account. For the row cache, one 
could make the argument that the overhead of the keys is negligible compared to 
the values. For the key cache however, the key are bigger than the values.
* The patch mistakenly remove the help for 'nodetool upgradesstables' (in 
NodeCmd.java)
* Would be worth adding a global cache log line in StatusLogger.
* Patch wrongly reintroduces memtable_operations and memtable_throughput to 
CliHelp.
* The default row cache provider since 1.0 is the serializing one, this patch 
sets the ConcurrentLinkedHashCacheProvider instead.

And a number of nits:
* In CFS, it's probably faster/simpler to use metadata.cfId rather than 
Schema.instance.getId(table.name, this.columnFamily)
* In CacheService, calling scheduleSaving with -1 as second argument would be 
slightly faster than using Integer.MAX_VALUE.
* In SSTableReader.cacheKey, the assert {{key.key == null}} is useless in trunk 
(DK with key == null can't be constructed).
* In AbstractCassandraDaemon, there's a unecessary import of 
javax.management.RuntimeErrorException
* There is some comments duplication in the yaml file
* I wonder if the reduce cache capacity thing still makes sense after this 
patch?
* In AutosavingCache, I think we could declare AutoSavingCache<K extends 
CacheKey, V> and get rid of the translateKey() method

                
> Global caches (key/row)
> -----------------------
>
>                 Key: CASSANDRA-3143
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>              Labels: Core
>             Fix For: 1.1
>
>         Attachments: 0001-global-key-cache.patch, 
> 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
> 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
> 0004-key-row-cache-tests-and-tweaks.patch, 
> 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
> 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch
>
>
> Caches are difficult to configure well as ColumnFamilies are added, similar 
> to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

Reply via email to