[
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215113#comment-14215113
]
stack commented on HBASE-10201:
-------------------------------
bq. We need to change protobuf definition
We could add extra fields in pb and write to two places for the life of an
hbase version to support rolling upgrade.
I hope you do not mind me surfacing here questions asked off list -- its best
to keep the discussion up here rather than off-list so others can participate
too.
You described off-list how the distributed log replay opens a region and puts
the highest *sequenceid* found up in zk and then uses this to figure which
edits to replay. You also talk of how regionServerReport includes the last
flush id of each region we carry and that the master keeps this around so on
log replay we can skip edits already flushed. You then ask:
bq. I think I need to change all these places to use a map which stored
familyName->maxSeqId instead of a single SeqId. Am I right?
The sequenceid is *region-scoped*: i.e. we keep a running sequenceid per
region. For the above to work out, we'd need to change the sequenceid scope to
be instead column-family rather than region. Since our memstore is by column
family, and since the memstore now uses the region sequenceid as its MVCC, this
might be a good direction to go in but it is not what we have now.
You cannot have it so there are discontinuities in the progress of the flush
sequenceid. If four column families, the edits can go in to any of the four
families in any order.
You could do something like [~gaurav.menghani] did (See
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
suggests above where rather than report on successful flush, the highest
sequenceid of all a regions' memstores involved in a flush, instead, when you
flush a column family only, you'd have to report one less than the oldest
outstanding edit still alive up in a column family memstore.
What if you did something much less involved; when there is pressure to flush,
flush the stores with the oldest edits until you've freed enough memory?
Upsides are that you'd clear out old edits from memory and we might let go of
WALs a little faster. Also, you might not flush all of the content in a region
-- because flushing just a few stores might be enough to get you back under the
threshold -- so we might make less small storefiles?
Downsides are we'd make some small storefiles (e.g. for those stores that have
a few old edits in them and little else) and we'd do the flush in series rather
than in //. Because of sequenceid accounting, we might replay more edits than
we have to.
> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
> Key: HBASE-10201
> URL: https://issues.apache.org/jira/browse/HBASE-10201
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Ted Yu
> Assignee: zhangduo
> Priority: Critical
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch,
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch,
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch,
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch,
> HBASE-10201_6.patch, HBASE-10201_7.patch
>
>
> Currently the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)