[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Vijay (JIRA) Tue, 25 Nov 2014 09:50:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224904#comment-14224904
 ]


Vijay commented on CASSANDRA-7438:
----------------------------------

{quote}
sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue?
StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder.
{quote}
Yep, and let me find alternatives for Java 8 (and until 8 for LongAdder).
{quote}
The queue really needs to be bounded, producer and consumer could proceed at 
different rates.
In Segment.java in the replace path AtomicLong.addAndGet is called back to 
back, could be called once with the math already done. I believe each of those 
stalls processing until the store buffers have flushed. The put path does 
something similar and could have the same optimization.
{quote}
Yeah those where a oversight.
{quote}
Tasks submitted to executor services via submit will wrap the result including 
exceptions in a future which silently discards them. 
The library might take at initialization time a listener for these errors, or 
if it is going to be C* specific it could use the wrapped runnable or similar.
{quote}
Are you suggesting a configurable logging/exception handling in case the 2 
threads throw exceptions? If yes sure. Other exceptions AFAIK are already 
propagated. (Still needs cleanup though).
{quote}
A lot of locking that was spin locking (which unbounded I don't think is great) 
is now blocking locking. There is no adaptive spinning if you don't use 
synchronized. If you are already using unsafe maybe you could do monitor 
enter/exit. Never tried it.
Having the table (segments) on heap is pretty undesirable to me. Happy to be 
proved wrong, but I think a flyweight over off heap would be better.
{quote}
Segments are small in memory so far in my tests, The spin lock is to make sure 
the lock checks the segment if rehash happened or not, this is better than 
having a seperate lock which will be central. (No different than java or 
memcached).
Not sure if i understand the UNSAFE lock any example will help. 
The segments are in heap mainly to handle the locking, I think we can do a bit 
of CAS but global lock on rehashing will be a problem (May be an alternate 
approach is required).
{quote}
It looks like concurrent calls to rehash could cause the table to rehash twice 
since the rebalance field is not CASed. You should do the volatile read, and 
then attempt the CAS (avoids putting the cache line in exclusive state every 
time).
{quote}
Nope it is Single threaded Executor and the rehash boolean is already volatile 
:)
Next commit will have conditions instead (similar to C implementation).
{quote}
If the expiration lock is already locked some other thread is doing the 
expiration work. You might keep a semaphore for puts that bypass the lock so 
other threads can move on during expiration. I suppose after the first few 
evictions new puts will move on anyways. This would show up in a profiler if it 
were happening.
{quote}
Good point… Or a tryLock to spin and check if some other thread released enough 
memory.
{quote}
hotN looks like it could lock for quite a while (hundreds of milliseconds, 
seconds) depending on the size of N. You don't need to use a linked list for 
the result just allocate an array list of size N. Maybe hotN should be able to 
yield, possibly leaving behind an iterator that evictors will have to repair. 
Maybe also depends on how top N handles duplicate or multiple versions of keys. 
Alternatively hotN could take a read lock, and writers could skip the cache?
{quote}
We cannot have duplicates in the Queue (remember it is a double linked list of 
items in cache). Read locks q_expiry_lock is all we need, let me fix it.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to