investigate/improve compaction performance
------------------------------------------

                 Key: HBASE-3103
                 URL: https://issues.apache.org/jira/browse/HBASE-3103
             Project: HBase
          Issue Type: Improvement
            Reporter: Kannan Muthukkaruppan


I was running some tests and am seeing that major compacting about 100M of data 
seems to take around 40-50 seconds. 

My simplified test case is something like:

* Created about a 100M store file (800M uncompressed).
* 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 
bytes) 
* Compression and ROWCOL bloom was turned on.

The test was to major compact this single store file into a new file.

Added some nanoTime() calls around these three stages:

* Scanner.next operations
* bloom computation logic in: StoreFile:append()
* StoreFile.Writer.append()

This is what I saw for these three stages:

{code}
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major 
Compaction scanTime (ns)         4338103000
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major 
Compaction bloom only time (ns) 14433821000
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major 
Compaction append time (ns)     23191478000
{code}

The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low 
(under 1 second levels). These are the times for the parts that interact with 
the DFS (readBlock() and finishBlock() mostly).

Are these numbers roughly in line with what others are seeing normally? 

Will double check my instrumentations, and try to get more data. Might try to 
run it under a profiler. But wanted to put it out there for additional 
input/ideas on improvement.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to