[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239958#comment-14239958
 ] 

Jeffrey Zhong commented on HBASE-10201:
---------------------------------------

This is a nice feature. I scan through the patch and below are my comments:

1) There may be a correctness issue for same version(same row key & version) 
updates. Because you use following code as store file flush id, we could end up 
multiple hstore files with exact same flush seq id. While HBase resolve same 
version updates by store files' seqid(flush id). Therefore, we may end up with 
incorrect results.  This issue may only happen in 0.98 though.
{code}
+          long oldestUnflushedSeqId = wal
+              .getEarliestMemstoreSeqNum(encodedRegionName);
{code} 
In order to fix the issue, we should use current store's max flushed seq id as 
its real hstore seq id. While we need to change HRegion.lastFlushSeqId to use 
oldestUnflushedSeqId to report back Master otherwise we may have data loss 
issue.

2)  We have a feature where we force a flush by 
hbase.regionserver.optionalcacheflushinterval or 
hbase.regionserver.flush.per.changes while I didn't see you handle both cases 
in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always 
return true and end up with small hstore files.

3) For region server recovery, we have an optimization by using lastFlushSeqId 
reported by region servers to skip writing edits into recovered.edits files. 
With this feature, we may unnecessarily write much more data into 
recovered.edits. This issue doesn't happen in log replay case.

4) Relating to your FlushMarker question, FulshMarker(or similar 
RegionEventWALEdit) are used for region replica feature and reasoning on 
region/store state. As you can see(in WALEdit class), those special events are 
using special column family "METAFAMILY" which doesn't exist for data regions. 
You should handle those events specially in getFamilyNames() otherwise they may 
affect your book keeping on oldest un-flushed seqid.  


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 1.0.0, 2.0.0, 0.98.9
>
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
> HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
> HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
> HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
> HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
> HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to