[
https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920235#action_12920235
]
Kannan Muthukkaruppan commented on HBASE-3103:
----------------------------------------------
We are based on the 0.89 branch.
One update: I removed all the instrumentation I had added. There was also some
noise earlier that I can't quite tell, but now it is not happening. The times I
am seeing now are closer to about ~17 seconds for compacting the 100M
storefile. Much of the cost in the test seemed to be under
compress/decompress/bloom logic. (HBASE-2997 doesn't seem really related to
these areas, so wondering if it'll matter for this case. But I will check it
out). Also: with blooms turned off, the times are about ~11 seconds.
Nicolas pointed to this bmdiff/zippy link:
http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/.
Has anyone tried this out with HBase?
> investigate/improve compaction performance
> ------------------------------------------
>
> Key: HBASE-3103
> URL: https://issues.apache.org/jira/browse/HBASE-3103
> Project: HBase
> Issue Type: Improvement
> Reporter: Kannan Muthukkaruppan
> Attachments: profiler_data.jpg
>
>
> I was running some tests and am seeing that major compacting about 100M of
> data seems to take around 40-50 seconds.
> My simplified test case is something like:
> * Created about a 100M store file (800M uncompressed).
> * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45
> bytes)
> * Compression and ROWCOL bloom was turned on.
> The test was to major compact this single store file into a new file.
> Added some nanoTime() calls around these three stages:
> * Scanner.next operations
> * bloom computation logic in: StoreFile:append()
> * StoreFile.Writer.append()
> This is what I saw for these three stages:
> {code}
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store:
> major Compaction scanTime (ns) 4338103000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store:
> major Compaction bloom only time (ns) 14433821000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store:
> major Compaction append time (ns) 23191478000
> {code}
> The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low
> (under 1 second levels). These are the times for the parts that interact with
> the DFS (readBlock() and finishBlock() mostly).
> Are these numbers roughly in line with what others are seeing normally?
> Will double check my instrumentations, and try to get more data. Might try to
> run it under a profiler. But wanted to put it out there for additional
> input/ideas on improvement.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.