Re: Why isn't there a separate JVM per table?

Jon Haddad Thu, 22 Feb 2018 14:11:09 -0800

Yeah, I’m in the compaction on it’s own JVM camp, in an ideal world where we’re 
isolating crazy GC churning parts of the DB.  It would mean reworking how tasks 
are created and removal of all shared state in favor of messaging + a smarter 
manager, which imo would be a good idea regardless.


It might be a better use of time (especially for 4.0) to do some GC performance 
profiling and cut down on the allocations, since that doesn’t involve a massive 
effort.  

I’ve been meaning to do a little benchmarking and profiling for a while now, 
and it seems like a few others have the same inclination as well, maybe now is 
a good time to coordinate that.  A nice perf bump for 4.0 would be very 
rewarding.

Jon

> On Feb 22, 2018, at 2:00 PM, Nate McCall <zznat...@gmail.com> wrote:
> 
> I've heard a couple of folks pontificate on compaction in its own
> process as well, given it has such a high impact on GC. Not sure about
> the value of individual tables. Interesting idea though.
> 
> On Fri, Feb 23, 2018 at 10:45 AM, Gary Dusbabek <gdusba...@gmail.com> wrote:
>> I've given it some thought in the past. In the end, I usually talk myself
>> out of it because I think it increases the surface area for failure. That
>> is, managing N processes is more difficult that managing one process. But
>> if the additional failure modes are addressed, there are some interesting
>> possibilities.
>> 
>> For example, having gossip in its own process would decrease the odds that
>> a node is marked dead because STW GC is happening in the storage JVM. On
>> the flipside, you'd need checks to make sure that the gossip process can
>> recognize when the storage process has died vs just running a long GC.
>> 
>> I don't know that I'd go so far as to have separate processes for
>> keyspaces, etc.
>> 
>> There is probably some interesting work that could be done to support the
>> orgs who run multiple cassandra instances on the same node (multiple
>> gossipers in that case is at least a little wasteful).
>> 
>> I've also played around with using domain sockets for IPC inside of
>> cassandra. I never ran a proper benchmark, but there were some throughput
>> advantages to this approach.
>> 
>> Cheers,
>> 
>> Gary.
>> 
>> 
>> On Thu, Feb 22, 2018 at 8:39 PM, Carl Mueller <carl.muel...@smartthings.com>
>> wrote:
>> 
>>> GC pauses may have been improved in newer releases, since we are in 2.1.x,
>>> but I was wondering why cassandra uses one jvm for all tables and
>>> keyspaces, intermingling the heap for on-JVM objects.
>>> 
>>> ... so why doesn't cassandra spin off a jvm per table so each jvm can be
>>> tuned per table and gc tuned and gc impacts not impact other tables? It
>>> would probably increase the number of endpoints if we avoid having an
>>> overarching query router.
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Why isn't there a separate JVM per table?

Reply via email to