[
https://issues.apache.org/jira/browse/HBASE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610088#action_12610088
]
Jim Kellerman commented on HBASE-674:
-------------------------------------
There are a number of issues here:
- multiple inserts or deletes for the same row/colum/timestamp are counted and
can inflate the memcache size some. This may not be a big issue because it is
unlikely that someone is using the same row/column/timestamp especially if they
do not specify a timestamp for puts or deletes.
- because of the inaccuracies of the above, subtracting the actual number of
flushed bytes from the memcache size leads to the potential of the memcache
size growing over time if fewer bytes are flushed than what HRegion thinks is
is the memcache. What we really need to do is keep track of both updates and
memcache size, so that during a flush, we accumulate the size of updates that
are taken after the snapshot. When the flush is completed, we can set the size
of the memcache to the number of bytes submitted as updates during the flush.
- why the memcache size seems to be going negative more frequently recently is
somewhat of a mystery. It is pretty easy to understand why we might flush less
than what we think is in the cache, but how would we flush more than what we
think is in the cache.
- Finally I don't particularly like the finished memcache flush message in
HRegion. It reports what it thinks is the current memcache size after the
flush, but doesn't say that. It would lead the casual observer to think that
the size reported by HRegion after the flush is the number of bytes flushed
from the cache.
> memcache size unreliable
> ------------------------
>
> Key: HBASE-674
> URL: https://issues.apache.org/jira/browse/HBASE-674
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.1.2
> Reporter: stack
> Fix For: 0.2.0
>
> Attachments: 674-v2.patch, 674.patch
>
>
> Multiple updates against same row/column/ts will be seen as increments to
> cache size on insert but when we then play the memcache at flush time, we'll
> only see the most recent entry and decrement the memcache size by whatever
> its size; memcache will be off.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.