Is it in trunk too? I'm running trunk build (got from the end of last week) in cluster and saw the disk i/o bottleneck.
On Fri, Feb 19, 2010 at 1:03 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > mmap is designed to handle that case, yes. it is already in 0.6 branch. > > On Fri, Feb 19, 2010 at 2:44 PM, Weijun Li <weiju...@gmail.com> wrote: > > I see. How much is the overhead of java serialization? Does it slow down > the > > system a lot? It seems to be a tradeoff between CPU usage and memory. > > > > As for mmap of 0.6, do you mmap the sstable data file even it is a lot > > larger than the available memory (e.g., the data file is over 100GB while > > you have only 8GB ram)? How efficient is mmap in this case? Is mmap > already > > checked into 0.6 branch? > > > > -Weijun > > > > On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> The whole point of rowcache is to avoid the serialization overhead, > >> though. If we just wanted the serialized form cached, we would let > >> the os block cache handle that without adding an extra layer. (0.6 > >> uses mmap'd i/o by default on 64bit JVMs so this is very efficient.) > >> > >> On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li <weiju...@gmail.com> wrote: > >> > The memory overhead issue is not directly related to GC because when > JVM > >> > ran > >> > out of memory the GC has been very busy for quite a while. In my case > >> > JVM > >> > consumed all of the 6GB when the row cache size hit 1.4mil. > >> > > >> > I haven't started test the row cache feature yet. But I think data > >> > compression is useful to reduce memory consumption because in my > >> > impression > >> > disk i/o is always the bottleneck for Cassandra while its CPU usage is > >> > usually low all the time. In addition to this, compression should also > >> > help > >> > to reduce the number of java objects dramatically (correct me if I'm > >> > wrong), > >> > --especially in case we need to cache most of the data to achieve > decent > >> > read latency. > >> > > >> > If ColumnFamily is serializable it shouldn't be that hard to implement > >> > the > >> > compression feature which can be controlled by an option (again :-) in > >> > storage conf xml. > >> > > >> > When I get to that point you can instruct me to implement this feature > >> > along > >> > with the row-cache-write-through. Our goal is straightforward: to > >> > support > >> > short read latency in high volume web application with write/read > ratio > >> > to > >> > be 1:1. > >> > > >> > -Weijun > >> > > >> > -----Original Message----- > >> > From: Jonathan Ellis [mailto:jbel...@gmail.com] > >> > Sent: Thursday, February 18, 2010 12:04 PM > >> > To: cassandra-user@incubator.apache.org > >> > Subject: Re: Testing row cache feature in trunk: write should put > record > >> > in > >> > cache > >> > > >> > Did you force a GC from jconsole to make sure you weren't just > >> > measuring uncollected garbage? > >> > > >> > On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li <weiju...@gmail.com> > wrote: > >> >> OK I'll work on the change later because there's another problem to > >> >> solve: > >> >> the overhead for cache is too big that 1.4mil records (1k each) > >> >> consumed > >> > all > >> >> of the 6gb memory of JVM (I guess 4gb are consumed by the row cache). > >> >> I'm > >> >> thinking that ConcurrentHashMap is not a good choice for LRU and the > >> >> row > >> >> cache needs to store compressed key data to reduce memory usage. I'll > >> >> do > >> >> more investigation on this and let you know. > >> >> > >> >> -Weijun > >> >> > >> >> On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis <jbel...@gmail.com> > >> >> wrote: > >> >>> > >> >>> ... tell you what, if you write the option-processing part in > >> >>> DatabaseDescriptor I will do the actual cache part. :) > >> >>> > >> >>> On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis <jbel...@gmail.com > > > >> >>> wrote: > >> >>> > https://issues.apache.org/jira/secure/CreateIssue!default.jspa<https://issues.apache.org/jira/secure/CreateIssue%21default.jspa>, > but > >> >>> > this is pretty low priority for me. > >> >>> > > >> >>> > On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li <weiju...@gmail.com> > >> >>> > wrote: > >> >>> >> Just tried to make quick change to enable it but it didn't work > out > >> > :-( > >> >>> >> > >> >>> >> ColumnFamily cachedRow = > >> >>> >> cfs.getRawCachedRow(mutation.key()); > >> >>> >> > >> >>> >> // What I modified > >> >>> >> if( cachedRow == null ) { > >> >>> >> cfs.cacheRow(mutation.key()); > >> >>> >> cachedRow = > >> >>> >> cfs.getRawCachedRow(mutation.key()); > >> >>> >> } > >> >>> >> > >> >>> >> if (cachedRow != null) > >> >>> >> cachedRow.addAll(columnFamily); > >> >>> >> > >> >>> >> How can I open a ticket for you to make the change (enable row > >> >>> >> cache > >> >>> >> write > >> >>> >> through with an option)? > >> >>> >> > >> >>> >> Thanks, > >> >>> >> -Weijun > >> >>> >> > >> >>> >> On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis < > jbel...@gmail.com> > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis > >> >>> >>> <jbel...@gmail.com> > >> >>> >>> wrote: > >> >>> >>> > On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li < > weiju...@gmail.com> > >> >>> >>> > wrote: > >> >>> >>> >> Just started to play with the row cache feature in trunk: it > >> >>> >>> >> seems > >> >>> >>> >> to > >> >>> >>> >> be > >> >>> >>> >> working fine so far except that for RowsCached parameter you > >> >>> >>> >> need > >> >>> >>> >> to > >> >>> >>> >> specify > >> >>> >>> >> number of rows rather than a percentage (e.g., "20%" doesn't > >> > work). > >> >>> >>> > > >> >>> >>> > 20% works, but it's 20% of the rows at server startup. So on > a > >> >>> >>> > fresh > >> >>> >>> > start that is zero. > >> >>> >>> > > >> >>> >>> > Maybe we should just get rid of the % feature... > >> >>> >>> > >> >>> >>> (Actually, it shouldn't be hard to update this on flush, if you > >> >>> >>> want > >> >>> >>> to open a ticket.) > >> >>> >> > >> >>> >> > >> >>> > > >> >> > >> >> > >> > > >> > > > > > >