[ 
https://issues.apache.org/jira/browse/HBASE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610088#action_12610088
 ] 

Jim Kellerman commented on HBASE-674:
-------------------------------------

There are a number of issues here:
- multiple inserts or deletes for the same row/colum/timestamp are counted and 
can inflate the memcache size some. This may not be a big issue because it is 
unlikely that someone is using the same row/column/timestamp especially if they 
do not specify a timestamp for puts or deletes.
- because of the inaccuracies of the above, subtracting the actual number of 
flushed bytes from the memcache size leads to the potential of the memcache 
size growing over time if fewer bytes are flushed than what HRegion thinks is 
is the memcache. What we really need to do is keep track of both updates and 
memcache size, so that during a flush, we accumulate the size of updates that 
are taken after the snapshot. When the flush is completed, we can set the size 
of the memcache to the number of bytes submitted as updates during the flush.
- why the memcache size seems to be going negative more frequently recently is 
somewhat of a mystery. It is pretty easy to understand why we might flush less 
than what we think is in the cache, but how would we flush more than what we 
think is in the cache.
- Finally I don't particularly like the finished memcache flush message in 
HRegion. It reports what it thinks is the current memcache size after the 
flush, but doesn't say that. It would lead the casual observer to think that 
the size reported by HRegion after the flush is the number of bytes flushed 
from the cache.

> memcache size unreliable
> ------------------------
>
>                 Key: HBASE-674
>                 URL: https://issues.apache.org/jira/browse/HBASE-674
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.2
>            Reporter: stack
>             Fix For: 0.2.0
>
>         Attachments: 674-v2.patch, 674.patch
>
>
> Multiple updates against same row/column/ts will be seen as increments to 
> cache size on insert but when we then play the memcache at flush time, we'll 
> only see the most recent entry and decrement the memcache size by whatever 
> its size; memcache will be off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to