[ 
https://issues.apache.org/jira/browse/HBASE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634435#comment-13634435
 ] 

Sergey Shelukhin commented on HBASE-7413:
-----------------------------------------

bq. Look in IPCUtils in the ipc package. See how it is used. We should figure 
out what is tough grokking Cell and CellBlock since if it is hard for you, it 
is going to be really hard for everyone else. Basically, we want to move away 
from KeyValue and instead use an Interface instead. The Interface is named 
Cell. rpc then has a notion of passing lots of Cells together in CellBlocks w/ 
the metadata on the block kept outside in an associated protobuf.
Hmm, nm, I was looking at it wrong. It seems to be straightforward. The usage 
of cells for both request and response is not very obvious :)
It does seem to result in extra copy though, when you build the cellblock, 
which is something we want to avoid for WAL path. Why is cell block ByteBuffer?
Currently the patch does similar thing, minus the fancy stuff adb the copy - 
write an optional protobuf field with number at the end of HLogKey (number of 
KVs, not size, so we don't need to serialize KVs in advance), then KVs directly 
to output stream.
We can add encoder and stuff there. It would require, aside from the task 
itself, adding a way to build cellblock directly into output without copy (and 
getting count from cellscanner in advance?), as well as redoing the way 
compression is currently done for HLog.Entry in general (moving it into 
compressor).
Should converting WAL to cells be separate JIRA? The format would be binary 
compatible, or nearly so.

bq. Is HLogKey a protobuf now (Haven't looked at patch)? If so, it is 
customizable? If it is not pb, should it be?
It is pb.
bq. I am still looking for my high level outline on this project with goal, and 
how you are going about it. I look at rb and it points here which has stuff 
distributed across multiple comments coming and going; hard to follow.
Here's the summary of the v0 patch.

h3. Current state
The WAL currently is a Hadoop sequence file with key being HLogKey, and value 
WALEdit, both writables. Hadoop sequence file contains some magic prefix, 
followed by metadata dictionary, and then alternating key and value writables, 
prefixed by sizes.
HLogKey contains table name, encoded region name, seqId, write time, and 
cluster ID for replication (only set in non-default, i.e. non original, 
clusters).
WALEdit contains KVs and replication scopes. The existing peculiarities in 
HLogKey and WALEdit format make it hard or impossible to make changes to them.
The sequence file metadata contains the indication of whether the file has 
compression. Compression uses dictionary encoding for table names, region names 
in HLogKey, as well as rows, families and qualifiers in in KVs.
h3. Goals
Make WAL extensible without sacrificing too much perf.
h3. New WAL format
New WAL format is logically similar to old WAL format. It starts with 4-byte 
magic that allows us to tell the old and new file apart.
That is followed by extensible PB WALHeader (file metadata), written using 
writeDelimited. It currently contains the flag indicating whether the file has 
compression.
After that, pairs of WAL record header ("HLogKey" for compat with existing 
code) and WALEdit (with KVs only), follow. Replication scopes have been moved 
to HLogKey from WALEdit, so the latter only has KVs.
WAL record header is an extensible PB structure. Compression for this is 
supported as before (byte arrays can contain dictionary encoding), with 
additional compression for replication scope column family names (using exactly 
the same approach).
"WALEdit" essentially becomes just KVs. One of the fields of WAL record header 
is the number of KVs that WALEdit contains. 
To avoid memory copies on KVs that are potentially large, KVs are written 
directly into file without the intervening protobuf step (PB talks bytes only 
in the form of ByteString, which is immutable in such way that memory copy 
cannot be avoided (unless KV itself were backed by a single ByteString w/o need 
to serialize, which is not such a bad idea but is out of the scope of this 
JIRA)).
KVs are written using a previously existing mechanism - VInt with length, 
followed by the backing byte array (which technically has nothing to do with 
writables, it's just raw format :)), or compressed format.
Reader reads the number of KVs indicated by WAL record header field, and 
assumes that these are followed by the next WAL record header.
h3. Supporting legacy WALs.
Writing legacy WALs is no longer supported. The writer class is moved to test 
code. HLogKey and WALEdit writable write methods are preserved for writing WAL 
in backward compat test, and output a warning.
Reading legacy WALs is supported thru HLogFactory::createReader. This method 
opens the stream and tries to read PB magic. If PB magic is present, new PB 
reader is returned; if it's not, it falls back to old SequenceFile-based reader.
Both readers derive from a common class that contains some shared functionality 
and interface (e.g. ::hasCompression()).
h3. Future improvement.
Ideally, given that they are used together for all practical purposes, we want 
to get rid of HLogKey and WALEdit (except for backward compat-related usage) 
and move to the notion of WAL record as a single thing (HLog.Entry?). However 
refactoring that all over the place is out of the scope of this JIRA.


                
> Convert WAL to pb
> -----------------
>
>                 Key: HBASE-7413
>                 URL: https://issues.apache.org/jira/browse/HBASE-7413
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: stack
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: HBASE-7413-v0.patch
>
>
> From HBASE-7201

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to