[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

Sylvain Lebresne (Commented) (JIRA) Tue, 25 Oct 2011 08:42:54 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135156#comment-13135156
 ]


Sylvain Lebresne commented on CASSANDRA-3143:
---------------------------------------------

I did a few pass over the patch. I haven't applied the patches yet (they need 
rebase anyway) and a few stuffs from the first patches is fixed in the later 
ones, which I realize only when reading those later patches, so even if I've 
tried to update my comments, there may be a few outdated ones, sorry about that.

First, some "top-level" remarks/comments:
* At least for the row cache, I fear this may sometimes be less efficient than 
what we have now, because some cf with less than good hit rate may evict rows 
of cf with very good hit rate, which wouldn't happen in the current 
implementation with reasonably tuned cache sizes. Aren't we screwing people 
that are doing fine tuning in the name of simplicity?
* Preceding point apart, we would at least need a way to deactivate row caching 
on a per-cf basis. We may also want that for key cache, though this seems less 
critical. My initial idea would be to either have a boolean flag if we only 
want to allow disabling row cache, or some multi-value caches option that could 
be "none", "key_only", "row_only" or "all".
* We probably need to have a story as far as upgrading to this patch is 
concerned (cleaning old saved caches, migrating to the new global options, 
...). It's probably fine just leaving instructions in the NEWS file as a start, 
but I'd rather do it with the patch to avoid forgetting.
* I think it'd be better to size the key cache by it's size in bytes rather 
than by number of entries, like for the row cache, since 'size alloted to the 
cache' is really the only measure that make sense for a user.  

Then a bit more specifically on some patches:

patch 2:
* For the Row cache key, we should really use the cfId instead of table and 
cfname.
* I think it would be worth factoring what can be of readSaved. In particular, 
we could create a KeyCacheKey (like RowCacheKey and that would really just be 
the usual pair of (Descriptor, DecoratedKey), have it share an interface with 
RowCacheKey (serialize and deserialize basically) and use that.
* Putting the clone of the key into the constructor of RowCacheKey is 
inefficient, we don't care about cloning when we invalidate or query the cache 
(nor when we deserialize RowCacheKeys).
* nit: not fond of declaration like public abstract Set<?> readSaved(), making 
it harder to know what the method returns just to save a few characters.

patch 3:
* In DatabaseDescriptor:
  {noformat}
    if (CacheService.instance == null)
        logger.error("Could not initialize Cache Service.");
  {noformat}
  I believe the CacheService.instance will load the CacheService that will 
trigger an exception if there is a problem, not set the instance to null. So in 
particular, this message will never be written. That code is moved by the 4th 
patch but the problem remains I think (note that we do want to make sure 
CacheService get loaded quickly to throw an exception if we have an
  initialization problem, it's just not the right way I believe).
* I'm not super fond of that key cache preloading. If the key cache save is 
outdated, we'll have a bunch of uselessly preloaded stuff (not a huge deal 
but...). Maybe we could keep doing as before and just save the set of keys for 
each column family instead. That would needs buffering of the keys during the 
cache save though, but not sure it's such a huge deal and it would reduce the 
size of the saved cache quite a bit.
* In CacheService, setRowCacheSavePeriodInSecond directly schedule the saving 
to the new value, but the get method grabs the value from DatabaseDescriptor, 
so it will always return the initially set value, which is not what we want. I 
think we should keep the Integer that were previously in CFS (but I'm fine 
moving them to CacheService).
* Why does the getRowCacheKeysToSave() option disappeared?

patch 4:
* The patch does the following change:
  {noformat}
  -            int newCapacity = (int) 
(DatabaseDescriptor.getReduceCacheCapacityTo() * size());
  +            int newCapacity = (int) 
(DatabaseDescriptor.getReduceCacheCapacityTo() * getCapacity());
  {noformat}
  but that's not the semantic of the operation. The initial was the right one.
* I really think that making DecoratedKey equals method deal with RowCacheKey 
is asking for trouble. Not sure why it would be useful either?

                
> Global caches (key/row)
> -----------------------
>
>                 Key: CASSANDRA-3143
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>              Labels: Core
>             Fix For: 1.1
>
>         Attachments: 0001-global-key-cache.patch, 
> 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
> 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
> 0004-key-row-cache-tests-and-tweaks.patch, 
> 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch
>
>
> Caches are difficult to configure well as ColumnFamilies are added, similar 
> to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

Reply via email to