[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212813#comment-13212813
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------



bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > I tried the compression tool on a log created by YCSB in "load" mode 
with the standard dataset. Since the values are fairly large here (100 bytes) 
it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). 
But still not bad. I looked at the resulting data using xxd and it looks like 
there's still a number of places where we could use variable length integers 
instead of non-variable length. I wrote a quick C program to count the number 
of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual 
table data is all human-readable text in this case, all of the 0x00s should be 
able to be compressed away, I think.
bq.  > 
bq.  > I also tested on a YCSB workload where each row has 1000 columns of 4 
bytes each (similar to an indexing workload) and the compression ratio was 60% 
(64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be 
removed.

checked it out. looks like in YCSB workloads the 0x00 bytes are actually 
indexes pointing to the 0th entry of the dictionary.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 52
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52>
bq.  >
bq.  >     invert the order of these || clauses - otherwise you get an 
out-of-bounds just running the tool with no arguments

fixed.


bq.  On 2012-02-07 02:58:00, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 86-88
bq.  > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86>
bq.  >
bq.  >     this code doesn't work properly. Here's what you want to do:
bq.  >     
bq.  >           Configuration conf = new Configuration();
bq.  >           FileSystem fs = path.getFileSystem(conf);
bq.  >

fixed.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
-----------------------------------------------------------


On 2012-02-15 04:57:45, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-15 04:57:45)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
23d27fd 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, 
> 4608v6.txt, 4608v7.txt, 4608v8fixed.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to