[
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239958#comment-14239958
]
Jeffrey Zhong commented on HBASE-10201:
---------------------------------------
This is a nice feature. I scan through the patch and below are my comments:
1) There may be a correctness issue for same version(same row key & version)
updates. Because you use following code as store file flush id, we could end up
multiple hstore files with exact same flush seq id. While HBase resolve same
version updates by store files' seqid(flush id). Therefore, we may end up with
incorrect results. This issue may only happen in 0.98 though.
{code}
+ long oldestUnflushedSeqId = wal
+ .getEarliestMemstoreSeqNum(encodedRegionName);
{code}
In order to fix the issue, we should use current store's max flushed seq id as
its real hstore seq id. While we need to change HRegion.lastFlushSeqId to use
oldestUnflushedSeqId to report back Master otherwise we may have data loss
issue.
2) We have a feature where we force a flush by
hbase.regionserver.optionalcacheflushinterval or
hbase.regionserver.flush.per.changes while I didn't see you handle both cases
in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always
return true and end up with small hstore files.
3) For region server recovery, we have an optimization by using lastFlushSeqId
reported by region servers to skip writing edits into recovered.edits files.
With this feature, we may unnecessarily write much more data into
recovered.edits. This issue doesn't happen in log replay case.
4) Relating to your FlushMarker question, FulshMarker(or similar
RegionEventWALEdit) are used for region replica feature and reasoning on
region/store state. As you can see(in WALEdit class), those special events are
using special column family "METAFAMILY" which doesn't exist for data regions.
You should handle those events specially in getFamilyNames() otherwise they may
affect your book keeping on oldest un-flushed seqid.
> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
> Key: HBASE-10201
> URL: https://issues.apache.org/jira/browse/HBASE-10201
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Ted Yu
> Assignee: zhangduo
> Priority: Critical
> Fix For: 1.0.0, 2.0.0, 0.98.9
>
> Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch,
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch,
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch,
> HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch,
> HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch,
> HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch,
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch,
> HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch,
> HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)