[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212813#comment-13212813 ]
jirapos...@reviews.apache.org commented on HBASE-4608: ------------------------------------------------------ bq. On 2012-02-07 02:58:00, Todd Lipcon wrote: bq. > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. bq. > bq. > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary. bq. On 2012-02-07 02:58:00, Todd Lipcon wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52 bq. > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52> bq. > bq. > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments fixed. bq. On 2012-02-07 02:58:00, Todd Lipcon wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88 bq. > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86> bq. > bq. > this code doesn't work properly. Here's what you want to do: bq. > bq. > Configuration conf = new Configuration(); bq. > FileSystem fs = path.getFileSystem(conf); bq. > fixed. - Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2740/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-02-15 04:57:45) bq. bq. bq. Review request for hbase, Eli Collins and Todd Lipcon. bq. bq. bq. Summary bq. ------- bq. bq. HLog compression. Has unit tests and a command line tool for compressing/decompressing. bq. bq. bq. This addresses bug HBase-4608. bq. https://issues.apache.org/jira/browse/HBase-4608 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2740/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Li bq. bq. > HLog Compression > ---------------- > > Key: HBASE-4608 > URL: https://issues.apache.org/jira/browse/HBASE-4608 > Project: HBase > Issue Type: New Feature > Reporter: Li Pi > Assignee: Li Pi > Attachments: 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v5.txt, > 4608v6.txt, 4608v7.txt, 4608v8fixed.txt > > > The current bottleneck to HBase write speed is replicating the WAL appends > across different datanodes. We can speed up this process by compressing the > HLog. Current plan involves using a dictionary to compress table name, region > id, cf name, and possibly other bits of repeated data. Also, HLog format may > be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira