[
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239062#comment-14239062
]
stack commented on HBASE-10201:
-------------------------------
Here is some feedback on reading through latest version of the patch. Lets
address these minor items after we are sure the most important part is working,
the sequenceid handling (I'm running tests here but its taking a while -- first
I need to prove that hbase 1.0 branch is healthy, then I intro your patch...
and [~jeffreyz], Mr SequenceId is going to take a look too). So hold on making
a patch till testing and Jeffrey's review are done.
Sorry this is taking so long to get in. On the one hand, you can tell we are
excited about getting this patch in because the improvement is really nice but
it touches a very sensitive part of hbase, the region sequenceid'ing, so we
need to exercise extra caution. Thanks for you patience [~Apache9]
Nits to be addressed on commit or if you make a new version of the patch
(you've done enough as it is -- smile -- and I could do below on commit np).
I can go over the javadoc on commit. Small edit would fix it all up nicely.
Below is a nit that can be addressed in a follow-on:
This config is not general. It belongs to a particular policy ("If
FlushLargeStoresPolicy is used..."):
hbase.hregion.percolumnfamilyflush.size.lower.bound Should probably have the
policy it is for in its name. Maybe just don't mention is in
hbase-default.xml. Let uses find it if they need it (16MB is a nice default
low-bound).
It is odd that this is public:
public static Class<? extends FlushPolicy> getFlushPolicyClass(
It is nice that the master tests that we can load a policy but it does not even
use flush policy (if we fail to load fall back to default with big warning?)
And flush policies are over in regionserver package so here we have master
reaching over and into the regionserver package. Would be good to avoid doing
this x-package reach especially when it does not seem to be needed.
I would think this would be an internal method for the factory to use?
Also in HTD, you call it getFlushPolicyClassName but here you call it
getFlushPolicyClass... would be good to be same.
This policy stuff you've added is nicer than what was here previous. Good one.
Should these two strings just be the same?
FLUSH_SIZE_LOWER_BOUND_KEY and DEFAULT_FLUSH_SIZE_LOWER_BOUND even though they
are read from different places? No harm the key being the same especially
since in HTD, you hide the key by providing getter/setters.
The FlushPolicy api is a little odd. It implements Configured but where do you
do a setConf on it? Then in the configureForRegion method, you take a Region
but all it is used for is to emit region name on Strings and to get instance of
HTableDescriptor. The flush takes a list of stores. Can't it get them from
the region it was given when configuredForRegion? This is a nit comment.
Ignore for now.
... Stopped at sequence id changes.... will be back. Thanks.
> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
> Key: HBASE-10201
> URL: https://issues.apache.org/jira/browse/HBASE-10201
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Ted Yu
> Assignee: zhangduo
> Priority: Critical
> Fix For: 1.0.0, 2.0.0, 0.98.9
>
> Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch,
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch,
> HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch,
> HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch,
> HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch,
> HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch,
> HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch,
> HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch,
> HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)