Re: unable to read saved rowcache from disk

aaron morton Sun, 18 Nov 2012 11:21:56 -0800

> . But what is the upper bound? And rules of thumb?
If you are using the off heap cache the upper bound is memory. If you are using 
the on head it's the JVM heap.


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 2:35 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:

> Did that take into account the token, the row key, and the row payload, and 
> the java memory overhead ?
> how could I watch the heap usage then since jconsole is not able to connect 
> Cassandra at that time?
> 
>  Trying delete the saved cache and restarting.
> there is no problem for me to do so. But what is the upper bound? And rules 
> of thumb?
> 
> 
> On Sat, Nov 17, 2012 at 9:15 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> Just curious why do you think row key will take 300 byte? 
> That's what I thought it said earlier in the email thread. 
> 
>>  If the row key is Long type, doesn't it take 8 bytes?
> Yes, 8 bytes on disk. 
>  
>> In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
>> Did I miss something?
> 
> 
> Did that take into account the token, the row key, and the row payload, and 
> the java memory overhead ?
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/11/2012, at 9:35 AM, Wei Zhu <wz1...@yahoo.com> wrote:
> 
>> Just curious why do you think row key will take 300 byte? If the row key is 
>> Long type, doesn't it take 8 bytes?
>> In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
>> Did I miss something?
>> 
>> Thanks.
>> -Wei
>> 
>> From: aaron morton <aa...@thelastpickle.com>
>> To: user@cassandra.apache.org 
>> Sent: Thursday, November 15, 2012 12:15 PM
>> Subject: Re: unable to read saved rowcache from disk
>> 
>> For a row cache of 1,650,000:
>> 
>> 16 byte token
>> 300 byte row key ? 
>> and row data ? 
>> multiply by a java fudge factor or 5 or 10. 
>> 
>> Trying delete the saved cache and restarting.
>> 
>> Cheers
>>  
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 15/11/2012, at 8:20 PM, Wz1975 <wz1...@yahoo.com> wrote:
>> 
>>> Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average 
>>> 300B, so 700k row should be a little over 200m, unless it is reading more,  
>>> maybe tombstone?  Or the rows on disk  have grown for some reason,  but row 
>>> cache was not updated?  Could be something else eats up the memory.  You 
>>> may profile memory and see who consumes the memory. 
>>> 
>>> 
>>> Thanks.
>>> -Wei
>>> 
>>> Sent from my Samsung smartphone on AT&T 
>>> 
>>> 
>>> -------- Original message --------
>>> Subject: Re: unable to read saved rowcache from disk 
>>> From: Manu Zhang <owenzhang1...@gmail.com> 
>>> To: user@cassandra.apache.org 
>>> CC: 
>>> 
>>> 
>>> 3G, other jvm parameters are unchanged. 
>>> 
>>> 
>>> On Thu, Nov 15, 2012 at 2:40 PM, Wz1975 <wz1...@yahoo.com> wrote:
>>> How big is your heap?  Did you change the jvm parameter? 
>>> 
>>> 
>>> 
>>> Thanks.
>>> -Wei
>>> 
>>> Sent from my Samsung smartphone on AT&T 
>>> 
>>> 
>>> -------- Original message --------
>>> Subject: Re: unable to read saved rowcache from disk 
>>> From: Manu Zhang <owenzhang1...@gmail.com> 
>>> To: user@cassandra.apache.org 
>>> CC: 
>>> 
>>> 
>>> add a counter and print out myself
>>> 
>>> 
>>> On Thu, Nov 15, 2012 at 1:51 PM, Wz1975 <wz1...@yahoo.com> wrote:
>>> Curious where did you see this? 
>>> 
>>> 
>>> Thanks.
>>> -Wei
>>> 
>>> Sent from my Samsung smartphone on AT&T 
>>> 
>>> 
>>> -------- Original message --------
>>> Subject: Re: unable to read saved rowcache from disk 
>>> From: Manu Zhang <owenzhang1...@gmail.com> 
>>> To: user@cassandra.apache.org 
>>> CC: 
>>> 
>>> 
>>> OOM at deserializing 747321th row
>>> 
>>> 
>>> On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>>> oh, as for the number of rows, it's 1650000. How long would you expect it 
>>> to be read back?
>>> 
>>> 
>>> On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu <wz1...@yahoo.com> wrote:
>>> Good information Edward. 
>>> For my case, we have good size of RAM (76G) and the heap is 8G. So I set 
>>> the row cache to be 800M as recommended. Our column is kind of big, so the 
>>> hit ratio for row cache is around 20%, so according to datastax, might just 
>>> turn the row cache altogether. 
>>> Anyway, for restart, it took about 2 minutes to load the row cache
>>> 
>>>  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) 
>>> reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
>>>  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) 
>>> completed loading (102801 ms; 21125 keys) row cache for XXX.f2 
>>> 
>>> Just for comparison, our key is long, the disk usage for row cache is 253K. 
>>> (it only stores key when row cache is saved to disk, so 253KB/ 8bytes = 
>>> 31625 number of keys). It's about right...
>>> So for 15MB, there could be a lot of "narrow" rows. (if the key is Long, 
>>> could be more than 1M rows)
>>>   
>>> Thanks.
>>> -Wei
>>> From: Edward Capriolo <edlinuxg...@gmail.com>
>>> To: user@cassandra.apache.org 
>>> Sent: Tuesday, November 13, 2012 11:13 PM
>>> Subject: Re: unable to read saved rowcache from disk
>>> 
>>> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>>> 
>>> A negative side-effect of a large row-cache is start-up time. The
>>> periodic saving of the row cache information only saves the keys that
>>> are cached; the data has to be pre-fetched on start-up. On a large
>>> data set, this is probably going to be seek-bound and the time it
>>> takes to warm up the row cache will be linear with respect to the row
>>> cache size (assuming sufficiently large amounts of data that the seek
>>> bound I/O is not subject to optimization by disks)
>>> 
>>> Assuming a row cache 15MB and the average row is 300 bytes, that could
>>> be 50,000 entries. 4 hours seems like a long time to read back 50K
>>> entries. Unless the source table was very large and you can only do a
>>> small number / reads/sec.
>>> 
>>> On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>>> > "incorrect"... what do you mean? I think it's only 15MB, which is not big.
>>> >
>>> >
>>> > On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo <edlinuxg...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Yes the row cache "could be" incorrect so on startup cassandra verify 
>>> >> they
>>> >> saved row cache by re reading. It takes a long time so do not save a big 
>>> >> row
>>> >> cache.
>>> >>
>>> >>
>>> >> On Tuesday, November 13, 2012, Manu Zhang <owenzhang1...@gmail.com> 
>>> >> wrote:
>>> >> > I have a rowcache provieded by SerializingCacheProvider.
>>> >> > The data that has been read into it is about 500MB, as claimed by
>>> >> > jconsole. After saving cache, it is around 15MB on disk. Hence, I 
>>> >> > suppose
>>> >> > the size from jconsole is before serializing.
>>> >> > Now while restarting Cassandra, it's unable to read saved rowcache 
>>> >> > back.
>>> >> > By "unable", I mean around 4 hours and I have to abort it and remove 
>>> >> > cache
>>> >> > so as not to suspend other tasks.
>>> >> > Since the data aren't huge, why Cassandra can't read it back?
>>> >> > My Cassandra is 1.2.0-beta2.
>>> >
>>> >
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 9
>> 
>> 
>> 
> 
>

Re: unable to read saved rowcache from disk

Reply via email to