[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176983#comment-13176983
 ] 

[email protected] commented on HBASE-4608:
------------------------------------------------------



bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 161
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line161>
bq.  >
bq.  >     use constant

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 48
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line48>
bq.  >
bq.  >     LOG.isDebugEnabled -- or maybe this should even be TRACE level

removed this completely, not needed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 34
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line34>
bq.  >
bq.  >     private final

removed completely.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 32
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line32>
bq.  >
bq.  >     this should be all caps -- but also probably something from the 
configuration

changed


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 
23
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65773#file65773line23>
bq.  >
bq.  >     does it have to be public?

now default.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 57
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line57>
bq.  >
bq.  >     hashCode() on a byte[] is identity-based - you should use 
Bytes.hashCode()

yup. i just figured this out. cost me a ton of pain. was wondering why things 
weren't compressing the way they should.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, 
lines 82-85
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65769#file65769line82>
bq.  >
bq.  >     indentation

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java,
 lines 144-150
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65770#file65770line144>
bq.  >
bq.  >     again the Context object here would make things a little cleaner to 
integrate:
bq.  >     - you can drop "compression" boolean and just check "if 
(compressionContext != null)"
bq.  >     - you only add one integration point to the existing code instead of 
lots of new member vars

will do in a refactoring pass.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 90
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line90>
bq.  >
bq.  >     I'd call this clear()

done.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, 
line 64
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line64>
bq.  >
bq.  >     equals is identity based here... should use Bytes.equals()
bq.  >     
bq.  >     Also Bytes.equals I believe handles nulls, so you can collapse two 
of these three clauses together

also just figured this out.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 
1655
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65768#file65768line1655>
bq.  >
bq.  >     since we have several methods that take all these parameters, and we 
might want to change the compression scheme in the future, I think it makes 
sense to introduce a class WALCompressionContext with getters for each of the 
dictionaries

Will make a compression context during refactoring.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, 
lines 57-58
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65766#file65766line57>
bq.  >
bq.  >     we should probably use vints here - most keys and many values are 
<100bytes long, so we could store the lengths in 1 byte instead of the 4 used 
here

Will do. I didn't bother compression the size values in KeyValue. Should do 
that as well - squeeze out extra space.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 70
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line70>
bq.  >
bq.  >     should have a finally { in.close(); } probably

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 28
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line28>
bq.  >
bq.  >     extra word "designed"?

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 33
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line33>
bq.  >
bq.  >     example should use arguments like "-u compressed-hlog 
uncompressed-hlog" rather than "filename" twice

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 37
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line37>
bq.  >
bq.  >     check args.length first and print help if it's not got 3 args

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 43-45
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line43>
bq.  >
bq.  >     should be an 'else if' -- and have a final 'else' clause that gives 
usage

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 60
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line60>
bq.  >
bq.  >     TODO: need to change this config key to match our others

fixed.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 66-69
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line66>
bq.  >
bq.  >     this assumes the whole log's content fits in memory, which shouldn't 
be necessary... why not loop reading one record from reader and writing one to 
writer?

will do in optimization pass.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 90
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line90>
bq.  >
bq.  >     should go in finally clause. Also use IOUtils.closeStream as long as 
"out" implements Closeable (I think it does?)

done.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 114-116
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line114>
bq.  >
bq.  >     why not combine this with the if/else above?

because we need to write our size.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
line 133
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line133>
bq.  >
bq.  >     most of this byte is wasted - we're only using 2 of the 6 bits... 
and I think we could actually get rid of EMPTY as well.
bq.  >     
bq.  >     If we limit the dictionaries to 32k entries, then we could use the 
following:
bq.  >     
bq.  >     If bit 0 == 0: dictionary reference
bq.  >       bits 1 through 15: the dictionary index
bq.  >     if bit 0 == 1: new value
bq.  >       start a varint encoding in this byte
bq.  >     
bq.  >     but let's leave this as is for now just to get the rest of the 
code-level issues cleaned up

will do optimisation pass next.


bq.  On 2011-12-27 17:42:31, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, 
lines 153-159
bq.  > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line153>
bq.  >
bq.  >     rather than this, why not use varints here so you don't have to 
specify up front what the size is?

This is how KeyValue stores the length of its stuff. Didn't want to change 
that. will do during optimisation pass.


- Li


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4121
-----------------------------------------------------------


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to 
rework the test cases to test this, and put something in the config file to 
enable/disable. Obviously this isn't ready for commit at the moment, but I can 
get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up 
with something cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java 
PRE-CREATION 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends 
> across different datanodes. We can speed up this process by compressing the 
> HLog. Current plan involves using a dictionary to compress table name, region 
> id, cf name, and possibly other bits of repeated data. Also, HLog format may 
> be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to